Professional Documents
Culture Documents
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted
digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about
JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms
Taylor & Francis, Ltd. is collaborating with JSTOR to digitize, preserve and extend access to The Journal
of Educational Research
This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
Elementary Teachers' Classroom
Assessment and Grading Practices
JAMES H. MCMILLAN
STEVE MYRAN
DARYL WORKMAN
Virginia Commonwealth University
203
This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
204 The Journal of Educational Research
sure student ability to apply what they have learned. Stig 555 elementary, middle, and high school teachers was used.
gins and Conklin (1992) asked 24 teachers to keep ajournai Overall mean performance on the survey was 66% correct.
to reflect their assessment practices. The analysis focused Teachers did better on items related to choosing and admin
on how teachers described their assessments and which spe istering assessments and significantly worse on communi
cific issues related to their assessments were raised. The cating results. According to the authors, the results "give
researchers found that teachers were most interested in empirical evidence of the anticipated woefully low levels of
assessing student mastery or achievement and that perfor assessment competency for teachers" (p. 67). The results
mance assessment was used frequently. The nature of the also showed that teachers who had taken a measurement
assessments used in each class was coupled closely with the course performed better than did teachers who lacked this
roles that each teacher set for her students, teacher expecta background.
tions, and the type of teacher-student interactions desired. In summary, the existing literature on elementary class
Marso and Pigge (1993) summarized research that sug room assessment practices indicates that teachers probably
gests that elementary teachers place more emphasis on stu need further training to improve the quality of the assess
dents' constructed-response work samples than on tradi ments that are used. Whatever the type of question used on
tional paper-and-pencil tests. They also reported that direct assessments, few are written to tap students' higher level
observation is used in language arts assessments more than thinking skills. Appropriately, teachers appear to use a vari
in other subjects and that essay assessments, although infre ety of assessment methods. Particularly absent in the litera
quent, tend to occur most in language arts, history, and ture, however, are large-scale examinations of relationships
social studies. The vast majority of teachers use several between classroom assessment practices and grade level
types of assessments, generally placing greatest emphasis and subject matter. There is some evidence of trends across
on completion and short-answer questions. Stiggins and grade level and subject matter, as in Stiggins and Bridgeford
Bridgeford (1985) also found that elementary teachers tend (1985) and Marso and Pigge (1993), but small samples are
to stress constructed-response tests rather than objective used in this study, and the researchers did not focus on dif
types of tests, although both are used extensively. Stiggins ferences between different elementary grades.
and Bridgeford (1985) reported that the use of teacher-made Teachers' grading practices have received far more atten
objective tests is positively related to grade level, that pub tion in the literature than have assessment practices. This
lished tests tend to be used more in early grades, and that fact may be due to the salient and summative nature of
teacher-made tests are relied on more for mathematics than grades to students and parents. Grades have important con
for English assessments. The authors concluded that "grade sequences and communicate student progress to parents.
level appears to be an important variable in understanding Stiggins, Frisbie, and Griswold (1989) set the stage for
classroom assessment" (p. 281). recent research on grading by providing an analysis of cur
In a survey of 143 elementary and secondary school rent grading practices as related to recommendations of
teachers, Cizek, Fitzgerald, Shawn, and Rachor (1995) measurement specialists and newly established Standards
found that assessment practices "were highly variable and for Teacher Competence in Educational Assessment of Stu
unpredictable from characteristics such as practice setting, dents (American Federation of Teachers, National Council
gender, years of experience, grade level or familiarity with on Measurement in Education, National Education Associ
assessment policies in their school district" (p. 159). This ation, 1990). In this study, the authors interviewed and/or
finding suggests that grade level may not be as important as observed 15 teachers on 19 recommendations from the
variations found between individual teachers. Overall, the measurement literature. They found that teachers use a wide
authors concluded that "many teachers seemed to have indi variety of approaches to grading and that they want their
vidual assessment policies that reflected their own individ grades to fairly reflect both student effort and achievement,
ualistic values and beliefs about teaching" (p. 160). The as well as to motivate students. Contrary to recommended
highly variable nature of assessment practices, as is pointed practice, Stiggins and colleagues found that teachers value
out in the following paragraphs, is consistent with how student motivation and effort and set different levels of
teachers grade students. The Cizek et al. study was limited expectation on the basis of student ability. This finding is
to elementary teachers attending a university measurement consistent with an earlier study by Gullickson (1985), in
course and the use of a limited number of questions that which elementary teachers indicated that they used nontest
were restricted to 10 factors used in grading and five information, such as class discussion and student behavior,
sources of assessment-related information. Furthermore, more than test results to grade students. Gullickson also
respondents simply checked whether each factor or source reported little difference in grading practices between sci
was used, without any indication of the extent of use. ence, social science, and language arts. Given the increased
Plake and Impara (1997) summarized results from a large emphasis on school and student accountability due to high
scale survey of teachers that was structured to obtain teacher stakes testing, the relative emphasis that teachers give to
competency concerning assessment practices, by asking tests may be increasing.
teachers to indicate which of several possible answers to Brookhart (1994) conducted a comprehensive review of
assessment questions was best. A national random sample of literature on teachers' grading practices. Her review identi
This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
March/April 2002 [Vol. 95(No. 4)] 205
fied 19 studies completed since 1984. In seven studies, of giving grades, using student achievement as the sole cri
researchers investigated secondary school grading; in 11 teria for determining grades is rare. This finding is consis
studies, both elementary and secondary school grading; and tent with earlier work by Brookhart (1991), in which she
in one study, elementary school teachers. She identified pointed out that grading often consists of a hodgepodge of
three general methods of study: (a) surveys in which teach attitude, effort, and achievement. A limitation of this study
ers responded to questions concerning components includ is the small sample of elementary teachers (30) and the use
ed in grading, grade distributions, and attitudes toward of only three nonachievement factors in scenarios that par
grading issues; (b) surveys in which teachers were asked to ticipants responded to (effort/ability, missing work, and
respond to grading scenarios, asking what they would do in improvement). In addition, the students in our study were
various circumstances; and (c) qualitative methods, includ taking a university measurement course, which could result
ing interviews, observation, and document analysis. Despite in socially desirable responses or answers that reflect the
methodological and grade-level differences, the findings perspectives of the instructor.
from these studies are remarkably similar. Taken together, Brookhart's conclusion concerning the variety of factors
Brookhart came to the following conclusions: that go into grading is consistent with Cizek et al. (1995).
Cizek and colleagues also found that teachers generally use
Teachers inform students of the components used in
a variety of objective and subjective factors to maximize the
grading.
likelihood that students obtain good grades.
Teachers try hard to be fair in grading.
In summary, the literature specific to elementary teach
Measures of achievement, especially tests, are major ers' assessment and grading practices is limited. There is an
contributors to grades.
indication that teachers believe it is important to combine
Student effort and ability are used widely as compo nonachievement factors, such as effort, ability, and conduct,
nents of grades.
with student achievement to determine grades; however,
Elementary teachers rely on more informal evidence most of the studies that provide the basis for this conclusion
and observation, whereas secondary teachers use paper have been conducted with secondary-level teachers.
and-pencil achievement tests and other written evidence as Although the studies are clear in this conclusion, less is
major contributors. known about how elementary teachers decide to weigh
Teachers' grading practices vary considerably from one these nonachievement factors in determining grades and
teacher to another, especially in perceived meaning and pur whether particular factors tend to be considered together or
pose of grades, and how nonachievement factors will be whether elementary teachers separate nonachievement fac
considered.
tors, such as effort and improvement, from achievement.
Teachers' grading practices are not consistent with rec Also, in most of the surveys and other approaches in previ
ommendations of measurement specialists, especially con ous studies, researchers have asked teachers about their
founding effort with achievement. beliefs or projected behavior on the basis of scenarios.
Actual assessment and grading practice may be different.
In one study, Brookhart (1993) investigated the meaning Few researchers used a scale in which teachers indicated the
that teachers give to grades and the extent to which value actual use of different assessment and grading practices that
judgments are used in assigning grades. The results indi allowed independent recording of the emphasis of each fac
cated that low-ability students who tried hard would be tor, rather than asking teachers to indicate the relative
given a passing grade even if the numerical grade were fail emphasis of each factor by percentage. Finally, there is no
ure, although working below ability level did not affect the research on whether a relationship exists between the types
numerical grade. That is, an average or above-average stu of assessments used and grades received by students.
dent would get the grade earned, whereas a below-average In the present study, we used a large sample of elementary
student would get a break if there were sufficient effort to
school teachers to describe assessment and grading practices
justify it. Teachers were divided about how to factor in
in a way that builds on and extends previous studies. We
missing work. About half of the teachers indicated that a
addressed four specific research questions, as follows:
zero should be given, even if that meant a failure for the
semester. The remaining teachers would lower the grade, 1. What is the current state of assessment practice and
but not to a failure. The teachers' written comments grading by elementary teachers?
showed that they strived to be fair to students. Teachers 2. What are major assessment and grading components
also seemed to indicate that a grade was a form of payment that are used by elementary teachers?
to students for work completed. That is, grades were some 3. What is the relationship between assessment and grad
thing that students earned, as compensation for work com ing practices and grades given to students?
pleted. This finding suggests that teachers, either formally 4. What are the relationships between the independent
or informally, include conceptions of student effort in variables grade level and subject taught (mathematics and
assigning grades. Because teachers are concerned with stu language arts) and the dependent variables assessment and
dent motivation, self-esteem, and the social consequences grading practices?
This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
206 The Journal of Educational Research
This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
March/April 2002 [Vol. 95(No. 4)] 207
Table 1.?Means and Standard Deviations of All Items Measuring Assessment and Grading Practices for Elementary Teachers (N = 901)
Question Not at all Very little Some Quite a bit Extensively Completely
Factors contributing to grades
Objective assessments 2 8 28 36 21 5
Peformance assessments 14 23 38 17 7 1
This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
208 The Journal of Educational Research
A B C D F
Grade level Math LA Math LA Math LA Math LA Math LA
3 (n = 294) 23.97 22.56 35.02 34.20 21.33 23.77 6.29 6.24 2.24 1.78
4(^ = 258) 21.39 22.27 35.50 34.73 24.52 24.35 6.67 6.22 2.90 2.33
5 (n = 205) 17.56 20.83 33.32 32.24 23.60 23.90 7.54 7.31 2.62 2.33
Mixed (n = 102) 19.56 18.52 32.18 33.08 21.91 22.48 7.11 6.07 2.54 1.75
Total 20.62 21.05 34.00 33.56 22.84 23.63 6.90 6.46 2.58 2.05
Note. Math = mathematics; LA = language arts. Percentages were estimated by the teachers.
This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
March/April 2002 [Vol. 95(No. 4)] 209
mathematics grades awarded were 12% or less for 35 ele components. A varimax rotation was used for the factor
mentary schools, whereas for 20 schools, the percentage of analyses.
mathematics A's awarded was 32%. It also shows a large The factor analysis for items used in grading (factors)
between-schools variation of the number of A's awarded. resulted in six components. There were no differences
between mathematics and language arts. The loadings of
different items are summarized in Table 5.
Data Reduction
The first component was comprised of three items that
Before examining the relationships between subject emphasized effort, ability, improvement, work habits, atten
(mathematics compared with language arts) and grade level tion, and participation. These items could be considered
(Grades 3, 4, and 5), we performed data reduction for each enablers to academic performance, important indicators to
of the major categories of items (factors, types, and cogni teachers to judge the degree to which the student has tried
tive levels) for both mathematics and language arts. The to learn and, by implication, actually learned. A second
first step in the data reduction was to eliminate items that component was defined by the two items that included
showed a floor effect with little variability. We used the questions about homework. The third component was
remaining items in the second step of the data reduction, an loaded on one item concerning grade distributions of other
exploratory factor analysis, to identify relationships among teachers. The fourth component included three items that
the items by reducing them to a few relatively independent, focused on academic performance of the student. The fifth
but conceptually meaningful, composite variables called component was loaded highly on two items that included
comparisons with other students. The sixth component
included the suggestion that borderline work and using
extra credit are related and distinct from other factors. Thus,
there appear to be six conceptually meaningful variables
Table 4.?Variation Within and Between Elementary Schools
for Selected Items (N = 105)
that elementary teachers use when grading students for both
language arts and mathematics. These variables include
Mean Mean actual performance, effort, ability and improvement, home
variation variation work, other teachers' grading, comparisons with other stu
Question within between dents, and borderline cases. Given the relatively low
emphasis on homework, comparisons with other students,
% A's awarded in mathematics 16.2 10.4
Student effort?how much students tried
other teachers' grading, and the infrequent occurrence of
to learn .92 .57 borderline cases, these results suggest that teachers concep
Assessments that measure student tualize two major ingredients?actual performance, and
reasoning .81 .42 effort, ability, and improvement. Of these two, academic
Objective assessments .97 .51
performance clearly is most important, but effort, ability,
and improvement remain as fairly important, especially for
some teachers.
The factor analysis for types of assessments used result
ed in three components. The item loadings were, for the
most part, the same for both subjects. The first component
was comprised of six items for mathematics types and four
items for language arts types, each of which described some
kind of constructed-response assessment, such as essays
(mathematics only), projects, and performance assess
ments. The second component, made up of either two or
three items, included objective assessments, quizzes (lan
guage arts only), and assessments provided by publishers.
Evidently, items provided by publishers are used in both
quizzes and objective assessments. The third component
was comprised of two items for mathematics (major exam
inations and teacher-made tests) and two items for language
arts (teacher-made tests and essays). This result suggests
that the common element in the third component was
teacher made. For mathematics and for language arts
essays, the major examinations tend to be teacher made.
The factor analysis for cognitive levels showed high
intercorrelation among the three items that suggested high
er order knowledge and skills (understanding, reasoning,
This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
210 The Journal of Educational Research
Note. Unless otherwise noted, factor loadings for mathematics and language arts were averaged.
and application). Teachers tended to think about these items The t test analyses showed that there were few differ
as one kind of skill, apart from recall knowledge, which did ences between language arts and mathematics assessment
not load on this analysis. and grading practices, despite the large sample size that
would have made it easy to detect statistically significant
differences. Clearly, there was more in common than there
Relationship Results was different on the basis of these two content areas. As
In the relationship analyses for subject matter and grade might be expected, differences occurred for the extent to
level, we used paired t tests and analysis of variance, which performance assessments were used (mean of 2.33
respectively, with standardized component scores for the for mathematics and 3.41 for language arts), projects com
items loading on each of the 10 components derived from pleted by individual students (mean of 3.01 for mathemat
the factor analyses, plus the percentage of A's given, as ics and 3.56 for language arts), and the use of assessments
dependent variables. We also performed a regression analy provided by publishers (mean of 3.56 for mathematics and
sis to determine if assessment and grading practices predict 3.23 for language arts). Thus, only three items in the cate
grades. Thus, there were two independent variables?sub gory types of assessments showed a difference between
ject matter, with two levels, and grade level, with three lev mathematics and language arts. When considering other
els, in the first two analyses; we used all 10 components as factors such as effort, participation, homework, and so
independent variables to predict the percentage of As forth, as well as cognitive levels, we found no difference
awarded. between the mathematics and language arts responses.
This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
March/April 2002 [Vol. 95(No. 4)] 211
Table 6.?Relationship of Grade Levels (3, 4, and 5) to Assessment Practices of Elementary Teachers
(N = 873)
Mathematics
Language arts
Table 7.?Factors, types of Assessments, and Cognitive Levels, as Predictor Variables of Percentage A's
Awarded for Elementary Teachers
Significant Significant
Subject positive relationship negative relationship
Mathematics (n = 714) .21 Higher order thinking and application Objective assessments
Publisher-provided items
Homework
We used one-way analyses of variance with Sheff? post small in both regressions, suggesting that the major predic
hoc tests, to examine the relationship between grade level tors of grades were not the weight given to different factors,
and assessment and grading practices. The results of these types of assessments, or cognitive level of assessments.
analyses are shown in Table 6, which contains a summary Given that finding, the percentage of As awarded tended to
of the components that indicate no relationship, those that increase with increased weight given to higher order think
show a positive relationship, and the single variable that ing assessments for mathematics, and constructed-response
shows a negative relationship. assessments for language arts. Negative relationships for
As with other analyses, the major finding was no differ mathematics were found with objective assessments, pub
ence between grade levels on components that were most lisher-provided items, and homework, and extra credit for
important to assessment and grading. For both language arts language arts.
and mathematics, the results showed that as grade level
increases, so does the importance of homework, extra cred Discussion
it, and constructed-response assessments. For mathematics,
the importance of objective assessments showed a positive The results of the analyses, consistent with earlier
relationship with grade level. In language arts, teacher research by Brookhart (1994) and Cizek et al. (1995), show
made major examinations contributed more in higher that most elementary teachers use a multitude of factors in
grades. The only negative relationship was found in the per grading students. The hodgepodge of factors considered
centage of A's awarded, which means that fewer As were when grading appear to be organized into six distinct com
awarded in higher grades. ponents. Academic performance is clearly the most impor
The predictive relationship between assessment and grad tant factor in grading students, as also reported by Stiggins
ing practices was examined with stepwise multiple regres and Conklin (1992), but the results of the present study
sion?one for language arts; one for mathematics, with per show that nontest performance and behavior, such as effort,
centage As awarded as the dependent variable; and the eight participation, and extra credit work, also are very important
weighted component scores as independent variables. The for many teachers, consistent with the Gullickson (1985)
results of these regressions are summarized in Table 7. study. Disruptive student behavior, grade distributions of
The multiple correlation coefficients were relatively other teachers, and norm-referenced interpretations con
This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
212 The Journal of Educational Research
tribute little to grading. A substantial percentage of elemen policy perspective, an appropriate question is whether there
tary teachers include zeros in the calculation of grades. is a good reason to permit or even encourage differences in
Because there are different ways this can be done, this find grading practices or there is a need for educational measure
ing suggests a need to explore in greater depth more specif ment experts to modify recommendations to be more con
ically how this is accomplished. sistent with the realities of teaching and the multitude of fac
Three major types of assessments are used?constructed tors that influence teacher behavior. An implication for the
response, such as projects, essays, and presentations; objec preparation of teachers is that it is important to train them to
tive assessments; and teacher-made major examinations. integrate assessment meaningfully into instruction. Another
Although objective assessments are used most frequently, implication for teacher training is to explore with students
there is also a great reliance on constructed-response types the importance of conceptualizing assessment as a process
of assessments. Teachers tend to differentiate the cognitive involving a multitude of factors, combined in unique ways
level of their assessments into two categories?recall that may differ for each individual.
knowledge and higher order thinking and application. High The factor analysis indicated that several student behav
er order thinking and application are emphasized heavily. iors considered together indicate what has been labeled here
There is a significant reliance on assessments that are as an Academic Enabling factor. These behaviors include
designed by publishers, even if most assessment is teacher student effort, participation, improvement, ability, and dis
made. This finding suggests that teachers need training in cussion. It appears that teachers use all of these behaviors,
how to evaluate the quality of their own assessments, as apart from actual academic performance, as important
well as those provided by others. indictors for determining grades. This finding has theoreti
Along with the variety of factors that go into grading, cal significance when one is conceptualizing the nature of
great variation exists within schools concerning the extent to student effort in relation to assessment. Brookhart (1997),
which teachers emphasize different factors in grading stu for example, has identified mental effort and realized stu
dents. The finding that within-school variance is greater than dent effort as a major component of a theoretical framework
between-school variance suggests that individual teacher for investigating the effects of classroom assessment. The
preferences are more important than are differences between present findings suggest that the manner in which effort is
schools in determining grading practices. This data suggest defined in this model may need to be broadened to include
that teachers vary considerably in how they weigh different some indication of improvement and ability. It may well be
factors, even within the same building, and that school and that teachers think about effort as mediated by improvement
student characteristics as a whole are less important than are and ability. That is, what is regarded as high or low effort is
individual beliefs. This finding is consistent with the highly dependent on what knowledge and skills students bring to
variable results found by Cizek et al. (1995) and McMillan the classroom and on a more general impression of ability.
and Nash (2000), confirming that an important characteristic It is clear that many teachers use academically enabling
of classroom assessment and grading practices is that they behaviors, such as participation, effort, and improvement
are highly individualized and may be unique from one extensively to determine grades, whereas other teachers do
teacher to another, even in the same school. It may be useful not use these variables very much. This finding may reveal
for teachers to discuss such differences and consider educational philosophies or approaches that give different
whether more consistency would provide students with a messages to students. For example, teachers who reward
clearer message about what is important. effort may be allowing students who are not competent, as
An implication of this variability of grading is that there well as their parents, to believe that they demonstrate need
seems to be only moderate consistency among teachers ed knowledge and skills. This action would be especially
about what is most important and how different factors are troubling if it occurred more with low-socioeconomic status
weighted. Why is this the case? One possible answer is that (SES) students, who could be rewarded for their effort to
teachers view grading as an extension of teaching to promote maintain involvement, because they might not be provided
student success in general in many areas important to with feedback that more accurately indicates their level of
schooling, including both academic achievement and aca knowledge and skills. Also, are teachers who weight effort
demic-enabling behaviors, such as responsibility, effort, more in reality "coddling" students, making it easier for
improvement, participation, and cooperation. If teachers them to obtain passing grades? Some support for this impli
would award grades solely on the basis of academic perfor cation is found in a study by Cauley and McMillan (2000),
mance, there might be much more consistency. This finding who found that middle school teachers at low-SES schools
also may explain why teachers' grading practices do not fol (determined by the percentage of students eligible for free
low established measurement principles. McMillan and and reduced-price lunches) tend to use nontest factors more
Nash (2000) found that teachers base their grading practices in grading than do teachers at low-SES schools.
on their educational philosophies and on what is best for Although few relationships were reported between assess
each student. Considering differences in styles of teaching, ment practices and grade level, greater emphasis in later
types of students, and curriculum, perhaps it is not surpris grades is placed on homework, extra credit, constructed
ing that there is high variability of grading practices. From a response assessments, objective assessments, and major
This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
March/April 2002 [Vol. 95(No. 4)] 213
This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms