You are on page 1of 13

A peer-reviewed electronic journal.

Copyright is retained by the first or sole author, who grants right of first publication to the Practical
Assessment, Research & Evaluation. Permission is granted to distribute this article for nonprofit, educational
purposes if it is copied in its entirety and the journal is credited.

Volume 11 Number 10, November 2006 ISSN 1531-7714

The Reliability, Validity, and Utility of Self-Assessment

John A. Ross
University of Toronto

Despite widespread use of self-assessment, teachers have doubts about the value and accuracy of
the technique. This article reviews research evidence on student self-assessment, finding that (1)
self-assessment produces consistent results across items, tasks, and short time periods; (2) self-
assessment provides information about student achievement that corresponds only in part to the
information generated by teacher assessments; (3) self-assessment contributes to higher student
achievement and improved behavior. The central finding of this review is that (4) the strengths of
self-assessment can be enhanced through training students how to assess their work and each of
the weaknesses of the approach (including inflation of grades) can be reduced through teacher
action.

A large proportion of teachers (76% in Noonan & 1. Is self-assessment a reliable assessment


Duncan, 2005) reports using self-assessment at least technique?
part of the time, even though teachers express
doubt about the value and accuracy of student self- 2. Does self-assessment provide valid evidence
appraisals. The doubts center on the concern that about student performance?
students may have inflated perceptions of their
accomplishments and that they may be motivated 3. Does self-assessment improve student
by self-interest. Frequently heard is the claim that performance?
the “good kids” under-estimate their achievement
while confused learners who do not know what 4. Is self-assessment a useful student
successful performance requires, over-estimate their assessment technique?
attainments. These concerns suggest, from a
measurement perspective, that self-assessment Definitions
introduces construct-irrelevant variance that
For the purpose of this article, I will follow
threatens the validity of grading.
Klenowski’s (1995) definition of self-assessment as
In this article I will examine research conducted “the evaluation or judgment of ‘the worth’ of one’s
on self-assessment for the purpose of addressing performance and the identification of one’s
these practical questions posed by teachers: strengths and weaknesses with a view to improving
one’s learning outcomes” (p. 146). This definition
emphasizes the ameliorative potential of self-
Practical Assessment Research & Evaluation, Vol 11, No 10 2
Ross, Self-Assessment

assessment and focuses attention on its (Bransford, Brown, & Cocking, 1999). My review
consequential validity. Although some of the will focus on self-assessments conducted in
research conducted on self-assessment has classroom settings and will touch only briefly upon
consisted of students appraising their work with findings from lab investigations. For an extensive
little interpretative guidance, I will argue with review of self-assessment in the context of
Klenowski that the benefits of self-assessment are metacognition, see Sundstrom (2005).
more likely to accrue when three conditions are
met: teacher and students negotiate self-assessment Why Teachers Use Self-Assessment
criteria, teacher-student dialogue focuses on
evidence for judgments, and self-assessments When asked why they include self-assessment in
contribute to a grade (by students alone or in their student assessment repertoires, teachers give a
collaboration with teachers). variety of responses. (1) Most frequently heard is
the claim that involving students in the assessment
Although self-assessment has long been part of of their work, especially giving them opportunities
the repertoire of classroom teachers, assessment to contribute to the criteria on which that work will
reform has increased its use. Key proponents of be judged, increases student engagement in
assessment reform (e.g.,Wiggins, 1993) recommend assessment tasks. (2) Closely related is the argument
that students submit a self-assessment with every that self-assessment contributes to variety in
major assignment. Self-assessment is a valid assessment methods, a key factor in maintaining
instance of assessment reform (as defined by student interest and attention. (3) Other teachers
Aschbacher, 1991; Newman, 1997; Wiggins, argue that self-assessment has distinctive features
1993;1998) in that (i) students create something that that warrant its use. For example, self-assessment
requires higher level thinking (i.e., they interpret provides information that is not easily determined,
their performance using overt criteria); (ii) the task such as how much effort students expended in
requires disciplined inquiry, (i.e., the criteria for preparing for the task. (4) Some teachers argue that
appraisal are derived from a specific discipline); (iii) self-assessment is a more cost-effective than other
the assessment is transparent (i.e., procedures, techniques. (5) Still others argue that students learn
criteria and standards are public); and (iv) the more when they know that they will share
student has opportunities for feedback and revision responsibility for the assessment of what they have
during the task (e.g., by responding to discrepancies learned.
between the student’s and teacher’s judgment).
Other important features of assessment reform, Practical Questions Addressed
e.g., the extent to which the task represents real by Researchers
world applications of school knowledge,
characterize some but not all self-assessments. Is self-assessment a reliable assessment
technique?
Some teachers find it helpful to distinguish
between self evaluation (judgments that are used for Reliability, meaning the consistency of the scores
grading) and self assessments (informal judgments produced by a measurement tool, can be
about attainment) as suggested by Gregory, determined in many ways. The internal consistency
Cameron, and Davies (2000). Not everyone finds of self-assessments is typically high. For example, J.
the distinction helpful; for example, the text on Ross, Rolheiser and Hogaboam-Gray (2002-b) had
classroom assessment by McMillan (2004) uses the grade 5-6 students rate their performance on a 1-10
terms interchangeably. Throughout this article, I scale for each of five dimensions of mathematical
will use the term self-assessment to refer to both problem solving. The internal consistency was .91.
formative and summative data collections. Similar results were obtained for grade 4-6 self-
assessments in English (alpha=.84 in J. Ross,
The term self-assessment is also used in the Rolheiser and Hogaboam-Gray, 1999). There is also
metacognition literature to refer to the judgments evidence of consistency across tasks. Fitzgerald,
an individual makes on the basis of self-knowledge Gruppen, and White (2000) examined the self-
Practical Assessment Research & Evaluation, Vol 11, No 10 3
Ross, Self-Assessment

assessments of medical students across two task many of the studies. There was extensive variation
formats: performance tasks (examination of about what constituted agreement; the criteria used
standardized patients) and cognitive tasks by teachers and students were frequently not
(interpreting vignettes or test results). They found defined; there were few replications involving
that students' self-assessments were consistent over comparable groups of students; some studies
a range of skills and tasks. combined effort with achievement in a single rating;
self-grading was not defined (e.g., it could be what a
Less frequently examined is consistency student deserves or what he or she expects to get).
between one time period and another. Blatchford S. Ross (1998) also found mixed results for self-
(1997) found mixed evidence for long time periods. teacher agreement in studies of second language
Blatchford reported that self-assessments were learning. He found a mean correlation of r=.64
stable between ages 11 and 16 in mathematics, (N=60 correlations) with wide variation among
although not in English, a finding Blatchford studies.
attributed to feedback being less clear in English
class than in mathematics. Blatchford found there Student self-assessments are generally higher
was little agreement of self-assessments between than teacher ratings, although exceptions have been
ages 7 and 11 in either subject. There is greater reported (e.g., Aitchison, 1995 for middle school
reliability when the time periods are shorter. Sung, music students). Over-estimates are more likely to
Chang, Chiou, and Hou (2005) had 14-15 year olds be found if the self-assessments contribute to the
assess the quality of their web-designs on three student’s grade in a course (Boud & Falchikov,
occasions within a narrow time frame: after 1989). Young children may over-estimate because
completing their designs, after viewing the designs they lack the cognitive skills to integrate
of others in their own group and after viewing the information about their abilities and are more
best and worst designs in the class. Sung found no vulnerable to wishful thinking. Butler (1990) found
significant differences across occasions. that self-teacher agreement increased from age 5 to
7 to 10 (correlations were r=.16, .38, .83
In summary, the evidence in support of the respectively). Agreement of teacher and student
reliability of self-assessment is positive in terms of assessments is also higher when students have been
consistency across tasks, across items, and over taught how to assess their work (J. Ross et al., 1999;
short time periods. The studies showing adequate Sung et al., 2005), when students have knowledge of
consistency involved students who had been trained the content of the domain in which the task is
in how to evaluate their work. There was less embedded (Longhurst & Norton, 1997; S. Ross,
consistency over longer time periods, particularly 1998), when learners know that their self-
involving younger children, and there were assessments will be compared to peer or supervisor
variations among subjects. ratings (Fox & Dinur, 1988), and when the
application of the assessment criteria involves low
Does self-assessment provide valid evidence level inferences (Pakaslahti & Keltikangas-Järvinen,
about student performance? 2000 found greater teacher-self agreement for direct
than for indirect measures of adolescent
Validity in self-assessment typically means aggression).
agreement with teacher judgments (considered to be
the gold standard) or peer rankings (usually the Agreement of self-assessment with peer
mean of multiple judges which tend to be more judgments is generally higher than self-teacher
accurate than the results from a single judge). agreement (Bergee, 1997; McEnery & Blanchard,
Research on the self-assessments of university 1999). One explanation might be that students
students produced mixed results. Boud and interpret assessment criteria differently than their
Falchikov (1989) reviewed 48 studies reporting self- teachers, for example, focusing on superficial
teacher assessment agreement. In most, self- features of the performance.
assessments agreed with teachers' ratings but the
reviewers expressed concern about the quality of
Practical Assessment Research & Evaluation, Vol 11, No 10 4
Ross, Self-Assessment

Fewer studies have examined other forms of goofing off. (J. Ross, Rolheiser, & Hogaboam-
validity, such as agreement with an objective Gray, 1998, p. 470)
criterion. Blatchford (1997) compared student self-
assessments to standardized tests, finding that age In summary, the evidence about the concurrent
moderated the relationship. Self-assessments were validity of self-assessments is mixed. However, the
significantly correlated with achievement at age 16 review suggests that discrepancies between self-
but not at 7 years. A related literature examined the assessments and scores on other measures should
accuracy of recall of the results of standardized tests be the stimulus for further inquiry, an invitation to
by university students. These studies found high review the evidence embedded in the learner’s
correlations (r = .87 to .97) when students self- performance that might reveal student strengths
reported GPA and SAT scores when applying to and learning needs not addressed by the formal
graduate school (Cassady, 2001; Talento-Miller & criteria.
Peyton, 2006); i.e., in conditions in which their self-
reports could be easily checked against official Does self-assessment improve student
documents. In broader settings, Kuncel, Credé and performance?
Thomas (2005) found that high achievers reported
their college and high school grades accurately but Consequential validity is the argument that the
accuracy was diminished for lower achievers, most worth of a test is determined by its consequences
likely by social desirability or self-enhancement for students and others. For example, a valid
factors. The correlations of self-assessments with assessment is one that contributes to student
external measures in the metacognition literature are learning—if the assessment has a negative effect on
mixed; for example, Ackerman, Beier and Bowen student learning, the test is invalid (Moss, 1998).
(2002) reported correlations ranging from r = -.07 The inclusion of consequences as a dimension of
to r = .68 across six subjects. Metacognition test validity is a key element of student assessment
researchers found that the correlations of self- reform.1
appraisals with external judgments were higher for
younger than older children (Kaderavek, Gillam, A few studies have demonstrated that asking
students to assess their performance, without
Ukrainetz,, Justice, Eisenberg, 2004), for upper than
further training, contributes to higher self-efficacy,
middle or lower SES students (Pappas, Ginsburg,
greater intrinsic motivation, and stronger
Jiang, 2003), and for boys than girls (Phillips &
achievement (Hughes, Sullivan, & Mosley, 1985;
Zimmerman, 1990).
Schunk, 1996; Sparks, 1991). Other research found
These studies suggest that self-assessments achievement outcomes in programs in which self-
provide information about student achievement that assessment was one of many treatment elements,
corresponds only in part to the information although its unique contribution could not be
generated by teacher assessments. The unexplained isolated. For example, Fontana and Fernandez
variation in self-teacher agreement has multiple (1994) found large achievement benefits for
sources, especially student inability to apply mathematics students aged 8-14 in a program in
assessment criteria, interest bias, and the which in which self-assessment was one of multiple
unreliability of teacher assessments. One systemic strategies for increasing student control of learning.
source of error might be that students include in
Other studies have focused on the effects of
their self-assessments information that is not
training students how to assess their work.
available to the teacher, peers or standardized tests.
Although treatments vary, self-assessment training
As one student put it:
typically consists of systematic instruction in each of
The teacher only knows so much of how much effort the elements that define self-assessment. For
you put into it. She has to look over the whole class. example, we applied strategies for teaching self-
You know personally how hard you worked on it assessment in four stages: (i) involve students in
and how you worked at home or if you were just defining assessment criteria (e.g., with teacher
assistance construct a rubric that expresses
Practical Assessment Research & Evaluation, Vol 11, No 10 5
Ross, Self-Assessment

performance expectations and student criteria in they planned to do with what they did, using a
language meaningful to students), (ii) teach students check mark to indicate if actions matched plans.
how to apply the criteria (e.g., model application of Henry found that use of the tool over 12 days
the rubric by assessing examples of performance), contributed to higher student self-direction, but
(iii) give students feedback on their self-assessments only for those with low self-direction on the pretest.
(e.g., engage students in evidence-based discussions Nelson, Smith, and Colvin (1995) devised a
of the differences between their self-assessments treatment in which disruptive grade 2 students were
and assessments by peers or the teacher), and (iv) taught how to observe their playground behavior,
help students use assessment data to develop action make judgments about it, and obtain feedback from
plans (e.g., find trends in performance and identify an adult. Self-assessment, when combined with
short and long term strategies for overcoming other treatment elements, reduced disruptive
weaknesses). Students trained in these processes behavior in the trained setting and in near transfer.
over 8-12 week periods outperformed control
samples in grade 4-6 narrative writing (J. Ross et al., A few negative outcomes of self-assessment
1999), grade 5-6 mathematics problem solving (J. have been reported. J. Ross, Rolheiser, and
Ross et al., 2002-b), and grade 11 geography (Ross Hogaboam-Gray (2002-a) reported a case study of
& Starling, 2005). Effect sizes were of small to self-assessment in a grade 11 mathematics
medium size; i.e., ES=.58 (for weaker writers), .40, classroom that resulted in reduced achievement
and .50 respectively. compared to a control class taught by the same
teacher (ES=-.35). Interview data suggested that
Less extensive training programs also produced student self-assessments persuaded students that
positive results. In a review of six studies of they did not understand core mathematics ideas,
secondary school student writing, Hillocks (1986) even though they were working hard to learn them.
found that providing students with rating scales to The conclusion that many students drew was that
assess their work improved writing quality. Arter, they lacked the ability to do advanced level
Spandel, Culham, and Pollard (1994) followed a mathematics. Some responded immediately with
similar strategy in which they gave grade 5 students ego-protecting effort reduction. Others resolved to
scales to assess essay writing; treatment students move out of the advanced mathematics stream in
outperformed controls (but on only one of the the next school year. However, the credibility of the
writing traits measured). Similarly Andrade and study was marred by ability differences between the
Boulay (2003) found that teaching grade 7-8 classes, which were resolved only partially by
students how to use a rubric for self-assessment statistical adjustments. Aitchison (1995) assigned
improved writing performance, although the effects grade 7-8 students to a variety of assessment
were limited to females for one of two writing conditions. Students in the self-assessment section
genres. McDonald and Boud (2003) found positive scored lower on two instrumental music tasks than
achievement effects for self assessment across a students in the assessment by teacher alone
range of subjects. condition or in a condition in which students and
teacher collaborated on the assessment. However,
Positive effects for self-assessment have also in the Aitchison study students completed self-
been reported for non-academic outcomes. J. Ross assessments on only three occasions and they
(1995) provided grade 7 students with transcripts of received no feedback on the accuracy of their
their conversations when working in cooperative judgments.
groups and a coding scheme for interpreting the
quality of their interactions. Students used the On balance, the research evidence suggests that
coding scheme to assess the frequency of help self-assessment contributes to higher student
giving and help seeking in their groups. Self- achievement and improved behavior. Figure 1
assessment contributed to increases in positive adapted from J. Ross et. (2002-a) provides an
interactions and a decline in off-task behavior. explanation for the findings based on social
Henry (1994) developed a self-assessment tool for cognition theory (Bandura, 1997).
K-grade 1 students in which they compared what
Practical Assessment Research & Evaluation, Vol 11, No 10 6
Ross, Self-Assessment

Figure 1: How Self-Assessment Contributes to Learning (adapted from Ross et al., 2002-a)

Self-assessment embodies three processes that Self-assessment contributes to self-efficacy


self-regulating students use to observe and interpret beliefs, i.e., student perceptions of their ability to
their behavior (Schunk, 1996). First, students perform the actions required by similar tasks likely
produce self-observations, deliberately focusing on to be encountered in the future. Students who
specific aspects of their performance related to their perceive themselves to have been successful on the
subjective standards of success. Second, students current task (i.e., who recognize it as a mastery
make self-judgments in which they determine how experience) are more likely to believe that they will
well their general and specific goals were met. be successful in the future (Bandura, 1997). Self-
Third, are self-reactions, interpretations of the assessment training also contributes to self-efficacy
degree of goal achievement that express how through vicarious experience (i.e., classroom
satisfied students are with the result of their actions. discussions of exemplars provide examples of
Training in self-assessment has an impact on successful experience by students’ peers). In
students’ self-assessments by focusing student addition, the willingness of teachers to share control
attention on particular aspects of their performance of assessment constitutes an “inviting message”; i.e.,
(e.g., the dimensions of the co-constructed rubric), information that the teacher perceives students to
by redefining the standards students use to be able and responsible, an important source of
determine whether they were successful (e.g., the positive efficacy information (Usher & Pajares,
levels of the rubric), and by structuring teacher 2005).
feedback to reinforce positive reactions to the
accurate recognition of successful performance. Students with greater confidence in their ability
These influences of self-assessment training to accomplish the target task are more likely to
increase the likelihood that students will interpret visualize success than failure. They set higher
their performance as a mastery experience, the most standards of performance for themselves. Student
powerful source of self-efficacy information expectations about future performance also
(Bandura, 1997). influence effort. Confident students persist. They
Practical Assessment Research & Evaluation, Vol 11, No 10 7
Ross, Self-Assessment

are not depressed by failure but respond to setbacks work (J. Ross et al., 1998). These changes in
with renewed effort. For example, students with perception in the value of assessment through
high self-efficacy interpret a gap between aspiration greater involvement in the process might reduce the
and outcome as a stimulus while low self-efficacy trend reported by Paris, Lawton, Turner and Roth
students perceive such a gap as debilitating evidence (1991) in which students become increasingly
that they are incapable of completing the task cynical about the validity and value of assessment as
(Bandura, 1997). The combination of higher goals they move through the school system.
and increased effort contributes to higher
achievement. Self-assessment encourages students to focus on
their attainment of explicit criteria, rather than
Positive self-assessments foster an upward cycle normative comparisons to other students, (although
of learning, as demonstrated by the studies that Blatchford’s, 1997 procedure is an exception). For
found positive outcomes for self-assessment. But example, when a grade four student in a classroom
the processes in Figure 1 can generate negative that used self-assessment extensively was asked
outcomes, as found in J.Ross et al. (2002-a). A what she compared her work to, she reported, “I
stream of negative self-assessments can lead usually compare it to my own work because not
students to select personal goals that are unrealistic, other people’s marks are going on my report
adopt learning strategies which are ineffective, exert card…so I need to see if I improved” (J. Ross,
low effort and make excuses for performance Rolheiser, & Hogaboam-Gray, 2002-c, p. 92). The
(Stipek, Recchia, & McClintic, 1992). same study found that student conversations about
self-assessment were much less focused on marks
Is self-assessment a useful student assessment than their conversations about assessments by the
technique? teacher, even though both types of assessment
contributed to the final grade.
Strengths. There is ample evidence that self-
assessment contributes to student achievement Relatively little research has been conducted on
(Hughes et al., 1985; Schunk, 1996; Sparks, 1991), the benefits of self-assessment for other groups.
particularly if teachers provide direct instruction in Teachers might benefit from self-assessment to the
how to self-assess (e.g., J. Ross et al., 1999; 2002-a; extent that making assessment criteria explicit to
2005). There is also evidence that self-assessment students might help teachers clarify their intentions
contributes to improved student behavior (Henry, and distinguish essential from less important
1994; Nelson et al., 1995; J. Ross, 1995). features of student performance. More focused
teaching might result. Teacher-student conferences
Some data suggest that students prefer self- to resolve discrepancies between self- and teacher-
assessment to assessment by the teacher alone. The assessments might give teachers insights into
reasons given by grade 5-11 students why they student thinking, especially student misconceptions
preferred self-assessment suggest additional benefits that impede further learning. Subsequent instruction
of self-assessment: 1) students said that with self- might explicitly address deficiencies revealed in the
assessment they had a better understanding of what conference. Little has been written about parent
they were supposed to do because they were reactions to self-assessment. However, the
involved in setting the criteria for the assessment; 2) construction of rubrics using language meaningful
students argued the self-assessment was fairer to students might also make the goals of the
because it enabled them to include important curriculum more accessible to parents and the
performance dimensions, such as effort, that would meaning of expected standards more transparent.
not usually be included in their grade; 3) self-
assessment enabled them to communicate Weaknesses. The number one concern of teachers
information about their performance (e.g., their about self-assessment is the fear that sharing
goals and reasoning) that was not otherwise control of assessment with students will lower
available to their teacher; 4) self-assessment gave standards and reward students who inflate their
them information they could use to improve their assessments. Lack of agreement of self-assessment
Practical Assessment Research & Evaluation, Vol 11, No 10 8
Ross, Self-Assessment

with teacher appraisals has many causes. Some are Academic rubrics are time consuming to produce.
errors of innocence. For example, students In addition, web-based repositories are not easily
conducting retrospective assessments of their work accessed; the criteria may too general or too
may not recall what they did; they may be unable to numerous; there may be insufficient delineation
accurately assess their production because they have between levels and descriptors may be too brief or
no idea what high performance looks like or they too long (Dornisch & Sabatini McLoughlin, 2006).
may not understand the assessment criteria or they However, behavioral rubrics are more familiar to
may lack the deductive skills involved in applying students and easier to use. For example, “comes
the criteria to their work. But the greatest teacher prepared to class” is a criterion that is easier for
concern is that “mark sharks” will intentionally teachers to describe and is more easily applied than
inflate their achievement, lying about their effort or a criterion like “demonstrates conceptual
misapplying the criteria. Students are also concerned understanding”.
about their ability to self-assess and of the potential
for cheating. As one noted, “People could just take Finally, teachers are concerned about parent
advantage of it and just mark all perfect when it’s reactions to self-assessment. Some parents expect
really not their best” (J. Ross et al., 1998). teachers to take sole responsibility for assessment
decisions. In a study of conferences involving
Even though students prefer self-assessment to students, teachers and parents Blake (2000) found
teacher appraisal alone, such participation is more that parents expected teachers to “sign off” on
work for students. Some describe it as boring (J. student self-assessments, confirming their validity.
Ross et al., 1998) and argue that it is unfair to ask
them to do the teacher’s job. Teachers express Making Self-Assessment More Useful
concern about the lack of student commitment to
the process, arguing that self-assessment will not Teachers who are concerned about the inaccuracy
work if students do not put the required effort into of self-assessment may be partially reassured by the
it. research evidence about the psychometric
properties of self-assessment. The concern is likely
In some jurisdictions, self-assessments may not to remain. Improvement in the utility of self-
be used in the determination of the student’s final assessment is most likely to come from attention to
grade. For example, in one Canadian province “Self four dimensions in training students how to assess
Assessment … should not be used to inform their their work.
report card grade or mark” (Ontario Ministry of
Education and Training, p. 463, emphasis in the First, the process for defining the criteria that
original). However, self-assessments may be students use to assess their work will improve the
included in a separate learning skills section of the reliability and validity of assessment if the rubric
report card. This policy requirement reduces teacher uses language intelligible to students, addresses
willingness to use self-assessment and may depress competencies that are familiar to students, and
student motivation to take the task seriously if they includes performance features they perceive to be
know that it does not count toward their grades. important. Rolheiser (1986) suggested several
strategies for engaging students in the construction
Although most research on self-assessment of simple rubrics. A key message in Rolheiser’s
focuses on its contribution to academic manual is that teachers should not surrender control
achievement, some teachers use self-assessment of assessment criteria but enact a process in which
only to measure social skills and to encourage students develop a deeper understanding of key
compliance with classroom rules. One explanation expectations mandated by governing curriculum
might be the previously noted policy impediments guidelines. Offering to expand the rubric to include
to using self-assessment for academic grading. In additional “kid-criteria” contributes to student
addition, teachers may find it more challenging to commitment. In addition to focusing student
engage students in constructing rubrics for attention on specific aspects of a domain, the
academic performance than for behavioral goals. construction of a rubric also provides students with
Practical Assessment Research & Evaluation, Vol 11, No 10 9
Ross, Self-Assessment

a language for talking about their learning. In some themselves, are specific, attainable with reasonable
instances, a process of progressive revelation of the amounts of effort, focus on near as opposed to
rubric may be appropriate, if students lack sufficient distant ends, and link immediate plans to longer
experience in the domain to be able to identify term aspirations. Recording goals in a contract
dimensions of mastery. increases accountability. Teachers can also address
student beliefs that contribute to higher goal setting,
Second, teaching students how to apply the such as attributions for success and failure and
criteria also contributes to the credibility of the seeing ability as something that can improve rather
assessment and student understanding of the rubric. than as a fixed entity.
Among the more powerful strategies are teacher
explanations of each criterion, teacher modeling of Teachers can further support self-assessment by
criteria application, and student practice in applying creating a climate in which students can publicly
the rubric to examples of student work (including self-assess. Strategies for creating trust in the
their own). Within-lesson comments that link classroom are readily available (e.g., Johnson,
instructional episodes and student tasks to Johnson, & Holubec, 1990). The usefulness of self-
assessment criteria reinforce student understanding assessment is likely to be enhanced by strategies that
of the criteria. shift students toward learning goals (i.e., students
approach classroom tasks in order to understand
Third, giving students feedback on their self- key ideas) and away from performance goals (i.e.,
assessments is a process of triangulating student students approach classroom tasks in order to
self-assessments with teacher appraisals and peer demonstrate they are smarter than their peers).
assessments of the same work using the same
criteria. Conferencing with individuals and groups Conclusions
to resolve discrepancies can heighten attention to
evidence, the antidote to lying and self-delusion. A There is sufficient information from research on
key issue is to help students move from holistic to self-assessment to answer the questions posed at the
analytic scoring of their work. For example, student outset of this article with reasonable confidence.
self-assessments are frequently driven by their The psychometric properties of self-assessment
perception of the effort expended on the suggest that it is a reliable assessment technique
assignment, an important criterion but it should not producing consistent results across items, tasks and
swamp attention to other dimensions of contexts and over short time periods. Teachers can
performance. strengthen reliability through such strategies as
engaging students in rubric construction. Evidence
Fourth, students need help in using self- of the validity of self-assessment is mixed. Self-
assessment data to improve performance. Student assessments are typically higher than teacher
sophistication in processing data improves with age. assessments, although the size of the discrepancy
For example, J. Ross et al. (2002-c) found that when can be reduced through student training and other
discussing assessments with parents and peers, teacher actions. In addition, differences between
grade 6 students were more likely to focus on self- and teacher-assessment can lead to productive
evidence of achievement and how to improve teacher-student conversations about student
performance, whereas grade 2-4 students focused learning needs. There is persuasive evidence, across
exclusively on the overall grade. In addition, older several grades and subjects, that self assessment
students were more likely than younger to compare contributes to student learning and that the effects
current to past achievement on similar tasks. grow larger with direct instruction on self
Teachers can provide simple recording forms for assessment procedures. The central finding of this
tracking performance over time to compensate for review is that the strengths of self-assessment can
memory loss. Teachers can provide games, be enhanced through specific student training and
conferences, and menus of examples to support each of the weaknesses of the approach (including
goal setting. Goals are more likely to improve inflation of grades) can be reduced through teacher
student achievement if they are set by students action.
Practical Assessment Research & Evaluation, Vol 11, No 10 10
Ross, Self-Assessment

Teachers can learn more about how to teach Messick (1995) argue that adverse consequences
students the skills of self-assessment by consulting undermine validity only if the problem is the result
manuals (e.g., Rolheiser, 1996) and through in- of a poor fit of the test with what it purports to
service activities. Little research has been conducted measure.
on strategies for training teachers in this domain,
References
with the exception of J. Ross et al. (1998). This
study compared two in-service options: a skills Ackerman, P. L., Beier, M. E., & Bowen, K. R.
training approach (researchers presented strategies (2002). What we really know about our abilities
for teaching self-assessment) and an action research and our knowledge. Personality and Individual
model (teacher developers of a self-assessment Differences, 33, 587-605.
program served as mentors for in-service
participants to develop their own programs). The Aitchison, R. E. (1995). The effects of self-evaluation
study found that the action research version of the techniques on the musical performance, self-evaluation
in-service had a more positive effect than the other accuracy, motivation and self-esteem of middle school
condition on student attitudes toward assessment, instrumental music students. Unpublished doctoral
in part because the learner control of the action dissertation, University of Iowa.
research approach to in-service was congruent with
sharing control with students in self-assessment. Andrade, H. G., & Boulay, B. A. (2003). Role of
rubric-referenced self-assessment in learning to
The research reviewed in this article provides write. Journal of Educational Research, 97(1), 21-34.
teachers with evidence that self-assessment, when
properly implemented, produces valid and reliable Arter, J., Spandel, V., Culham, R., & Pollard, J.
information about student achievement. The (1994, April). The impact of training students to be self-
optimal use of this information is for formative assessors of writing. Paper presented at the annual
purposes, providing credible data on student meeting of the American Educational Research
cognitions about their achievement that is not Association, New Orleans.
otherwise available to teachers. Teachers who make
a serious commitment to learning about self- Aschbacher, P. R. (1991). Performance assessment:
assessment and teaching these techniques to their State, activity, interest, and concerns. Applied
students can plausibly anticipate enhanced student Measurement in Education, 4, 275-288.
motivation, confidence, and achievement.
Bandura, A. (1997). Self-efficacy: The exercise of control.
New York: W. H. Freeman.
Notes: Bergee, M. J. (1997). Relationships among faculty,
peer, and self-evaluations of applied
1 Preparation of this article was supported by a performances . Journal of Research in Music
grant from the Social Sciences and Humanities Education, 45(4), 601-612.
Research Council of Canada. The views expressed
in this article do not necessarily represent the views Blake, M. (2000). Developing a curriculum linking
of the Council. cooperative learning and student-directed
parent-teacher conferences. Schools in the Middle,
2Not all supporters of assessment reform subscribe 9(7), 40-44.
to consequential validity. Popham (1997) argued
that the consequences of assessment should be a Blatchford, P. (1997). Students’ self assessment of
major concern of right minded people but cluttering academic attainment: Accuracy and stability from
the concept of validity with issues of the effects of 7 to 16 years and influence of domain and social
test use sows confusion. Popham argued that test comparison group. Educational Psychology, 17(3),
developers should be concerned with consequences, 345-360.
intended and unintended, but a test can be valid or
invalid regardless of its consequences. Others, like
Practical Assessment Research & Evaluation, Vol 11, No 10 11
Ross, Self-Assessment

Boud, D., & Falchikov, N. (1989). Quantitative Clearinghouse on Reading and Communication
studies of student self-assessment in higher Skills.
education: A critical analysis of findings. Higher
Education, 18, 529-549. Hughes, B., Sullivan, H., & Mosley, M. (1985).
External evaluation, task difficulty, and
Bransford, J. D., Brown, A. L., & Cocking, R. R. continuing motivation. Journal of Educational
(1999). How people learn: Brain, mind, experience and Research, 78(4), 210-215.
school. Washington, DC: National Academy
Press. Johnson, D. W. J. R., & Holubec, E. (1990). Circles
of learning: Cooperation in the classroom. Edina, MN:
Butler, R. (1990). The effects of mastery and Interaction Book Company.
competitive conditions on self-assessment at
different ages. Child Development, 61, 201-210. Kaderavek, J. N., Gillam, R. B., Ukrainetz, T. A.,
Justice, L. M., & Eisenberg, S. N. (2004). School-
Cassady, J. C. (2001) Self-reported GPA and SAT: A age children's self-assessment of oral narrative
methodological note. Practical Assessment, Research & production . Communication Disorders Quarterly,
Evaluation, 7(12). [Web Page]. URL Retrieved 26(1), 37-48.
November 14, 2006 from
http://PAREonline.net/getvn.asp?v=7&n=12. Klenowski, V. (1995). Student self-evaluation
processes in student-centred teaching and
Dornisch, M. M., & Sabatini McLoughlin, A. learning contexts of Australia and England.
(2006). Limitations of web-based rubric Assessment in Education, 2(2), 145-163.
resources: Addressing the challenges . Practical
Assessment, Research & Evaluation, 11(3). Kuncel, N. R., Credé, M., & Thomas, L. L. (2005).
The validity of self-reported grade point
Fitzgerald, J. T., Gruppen, L. D., & White, C. B. averages, class ranks, and test scores: A meta-
(2000). The influence of task formats on the analysis. Review of Educational Research, 75(1), 63-
accuracy of medical students' self-assessments . 82.
Academic Medicine, 75(7), 737-741.
Longhurst, N., & Norton, L. S. (1997). Self-
Fontana, D., & Fernandes, M. (1994). assessment in coursework essays. Studies in
Improvements in math performance as a Educational Evaluation, 23(4), 319-330.
consequence of self-assessment in Portuguese
primary school pupils. British Journal of Educational McDonald, B., & Boud, D. (2003). The impact of
Psychology, 64, 407-417. self-assessment on achievement: The effects of
self-assessment training on performance in
Fox, S., & Dinur, Y. (1988). Validity of self- external examinations. Assessment in Education,
assessment: A field evaluation. Personnel 10(2), 209-220.
Psychology, 41, 581-592.
McEnery, J. M., & Blanchard, P. N. (1999). Validity
Gregory, K., Cameron, C., & Davies, A. (2000). Self- of multiple ratings of business student
assessment and goal-setting. Merville BC: performance in a management simulation.
Connections Publishing. Human Resource Development Quarterly., 10(2), 155-
172.
Henry, D. (1994). Whole language students with low self-
direction: A self-assessment tool. Charlottesville: McMillan, J. H. (2004). Classroom assessment: Principles
University of Virginia. and practice for effective instruction. 3rd edition.
Toronto: Pearson Allyn Brown.
Hillocks, G. (1986). Research on written composition:
New directions for teaching. Urbana, IL: ERIC Messick, S. (1995). Standards of validity and the
validity of standards in performance assessment.
Practical Assessment Research & Evaluation, Vol 11, No 10 12
Ross, Self-Assessment

Educational Measurement: Issues and Practice, 14(4), Popham, W. J. (1997). Consequential validity: Right
5-8. concern--wrong concept. Educational Measurement:
Issues and Practice, 16(2), 9-13.
Moss, P. (1998). The role of consequences in
validity theory. Educational Measurement: Issues and Rolheiser, C. (Ed.). (1996). Self-evaluation...Helping
Practice, 17(2), 6-12. kids get better at it: A teacher's resource book.
Toronto: OISE/UT.
Nelson, J. R., Smith, D. J., & Colvin, G. (1995). The
effects of a peer-mediated self-evaluation Ross, J. A. (1995). Effects of feedback on student
procedure on the recess behavior of students behavior in cooperative learning groups in a
with behavioral problems. Remedial and Special grade 7 math class. Elementary School Journal,
Education, 16(2), 117-126. 96(2), 125-143.

Newman, F. (1997). Authentic assessment in social Ross, J. A., Hogaboam-Gray, A., & Rolheiser, C.
studies: Standards and examples. In G. D. Phye (2002a). Self-Evaluation in grade 11
(Ed.), Handbook of classroom assessment: Learning, mathematics: Effects on achievement and
adjustment, and achievement. San Diego: Academic student beliefs about ability. In D. McDougall
Press. (Ed.), OISE Papers on Mathematical Education.
Toronto: University of Toronto.
Noonan, B., & Duncan, C. R. (2005). Peer and self-
assessment in high schools . Practical Assessment, Ross, J. A., Hogaboam-Gray, A., & Rolheiser, C.
Research & Evaluation, 10 (17). (2002b). Student self-evaluation in grade 5-6
mathematics: Effects on problem solving
Ontario Ministry of Education and Training. (2005). achievement. Educational Assessment, 8(1), 43-58.
Ontario provincial elementary assessment and evaluation:
Effective elementary assessment and evaluation classroom Ross, J. A., Rolheiser, C., & Hogaboam-Gray, A.
practices. Toronto: author. (1999). Effect of self-evaluation on narrative
writing. Assessing Writing, 6(1), 107-132.
Pakaslahti, L., & Keltikangas-Järvinen, L. (2000).
Comparison of peer, teacher and self- Ross, J. A., Rolheiser, C., & Hogaboam-Gray, A.
assessments on adolescent direct and indirect (2002c). Influences on student cognitions about
aggression. Educational Psychology, 20(2), 177-190. evaluation. Assessment in Education, 9(1), 81-95.

Pappas, G., Lederman, E., & Broadbent, B. (2001). Ross, J. A., Rolheiser, C., & Hogaboam-Gray, A.
Monitoring Student Performance in Online (1998). Skills training versus action research in-
Courses: New Game-New Rules. Journal of service: Impact on student attitudes to self-
Distance Education, 16(2). evaluation. Teaching and Teacher Education, 14(5),
463-477.
Paris, S., Lawton, T., Turner, J., & Roth, J. (1991).
A developmental perspective on standardized Ross, J.A. & Starling, M. (2005, April). Achievement
achievement testing. Educational Researcher, 20(5), and self-efficacy effects of self-evaluation training in a
12-20, 40. computer-supported learning environment. Paper
presented at the annual meeting of the American
Phillips, D. A., & Zimmerman, M. (1990). The Educational Research Association, Montreal.
development course of perceived competence
and incompetence among competent children. Ross, S. (1998). Self-assessment in second language
In R. J. Sternberg & J. J. Kolligian (Eds.), testing: A meta-analysis and analysis of
Competence considered (pp. 41-66). London: Tale experiential factors. Language Testing, 15(1), 1-20.
University Press.
Schunk, D. H. (1996). Goal and self-evaluative
influences during children's cognitive skill
Practical Assessment Research & Evaluation, Vol 11, No 10 13
Ross, Self-Assessment

learning. American Educational Research Journal, Talento-Miller, E., & Peyton, J. (2006) Moderators of
33(2), 359-382. the accuracy of self-report Grade Point Average. General
Management Admission Council Research Reports •
Sparks, G. E. (1991). The effect of self-evaluation on RR-06-10 [Web Page]. URL
musical achievement, attentiveness and attitudes of http://www.gmac.com/NR/rdonlyres/9B6C60
elementary school instrument students. Unpublished A5-F9D6-4E69-91CB-
doctoral dissertation, Louisiana State University 22F949E7CA59/0/RR0610_SRUGPA.pdf
and Agricultural and Mechanical College. [2006, November 14].

Stipek, D., Recchia, S., & McClintic, S. (1992). Self- Usher, E. L., & Pajares, F. (2005, April). Inviting
evaluation in young children. Monographs of the confidence in school: Sources and effects of academic and
Society for Research in Child Development, 57, 1-84. self-regulatory efficacy beliefs. Paper presented at the
meeting of the American Educational Research
Sundström, A. (Self-assessment of knowledge and abilities: Association, Montreal.
A literature study [Web Page]. URL
http://www.umu.se/edmeas/publikationer/pdf Wiggins, G. (1993). Assessing student performance:
/EM541.pdf [2006, November 14]. Explore the purpose and limits of testing. San
Francisco, CA: Jossey-Bass.
Sung, Y.-T., Chang, K.-E., Chiou, S.-K., & Hou,
H.-T. (2005). The design and application of a Wiggins, G. (1998). Educative Assessment: Designing
web-based self- and peer-assessment system. assessments to inform and improve student performance.
Computers and Education, 45(2), 187-202. San Francisco: Jossey-Bass.
.

Citation
Ross, John A. (2006). The Reliability, Validity, and Utility of Self-Assessment. Practical Assessment Research
& Evaluation, 11(10). Available online: http://pareonline.net/getvn.asp?v=11&n=10

Author

Dr. John A. Ross,


Professor and Centre Head,
PO Box 719, 1994 Fisher Dr.
Peterborough, ON K9J 7A1
Canada

You might also like