You are on page 1of 12

Elementary Teachers' Classroom Assessment and Grading Practices

Author(s): James H. McMillan, Steve Myran and Daryl Workman


Source: The Journal of Educational Research, Vol. 95, No. 4 (Mar. - Apr., 2002), pp. 203-213
Published by: Taylor & Francis, Ltd.
Stable URL: http://www.jstor.org/stable/27542381
Accessed: 06-02-2017 16:46 UTC

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted
digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about
JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms

Taylor & Francis, Ltd. is collaborating with JSTOR to digitize, preserve and extend access to The Journal
of Educational Research

This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
Elementary Teachers' Classroom
Assessment and Grading Practices
JAMES H. MCMILLAN
STEVE MYRAN
DARYL WORKMAN
Virginia Commonwealth University

addition, over the last decade, significant emphasis has been


ABSTRACT The authors investigated the assessment and placed on using alternative assessments, such as perfor
grading practices of over 900 Grades 3-5 teachers represent mance assessments and portfolios, rather than on tradition
ing urban, suburban, and rural schools. Teachers indicated the al paper-and-pencil assessments.
extent to which they used various factors to grade students, the
Given the variety of assessment and grading practices in
types of assessments used, the cognitive level of assessments,
the field, the increasing importance of alternative assess
and the grades awarded. Teachers appeared to conceptualize 6
ments, the critical role that each classroom teacher plays in
major factors when they graded students; they placed the
greatest weight on academic performance and academic determining assessments and grades, and the trend toward
enabling behaviors, such as effort and improvement, and greater accountability of teachers with state assessment
much less emphasis on homework, comparisons with other approaches that are inconsistent with much of the current
students, grade distributions of other teachers, and borderline literature, one needs to fully understand current assessment
cases. The teachers used 3 types of assessments?constructed and grading practices. Such information is important to
response, objective, and teacher-made major examinations; understand how classroom assessments are constructed and
they differentiated between recall and higher level cognitive used in this new climate. The purpose of this investigation
skills. However, there were few relationships between assess was to describe actual classroom assessment and grading
ment and grade level, subject matter assessed, and grades practices of upper level elementary teachers, to determine
awarded. Results are discussed in light of other research, indi
the primary factors used in grading, and to determine
cating that teachers use a "hodgepodge" of factors when
whether meaningful relationships exist between indepen
assessing and grading students.
dent variables grade level and subject taught and dependent
Key words: classroom assessment, elementary teachers, grad
variables assessment and grading practices.
ing practices
The literature tends to separate assessment practices from
grading practices. In this review, we first examined assess
ment practices. Airasian (1984) reviewed literature that
A significant amount
classroom assessment of recent
and grading literature has focused on
as essential aspects suggests that teachers focus their classroom assessments in
of effective teaching. An increased scrutiny of assessment is two areas?academic achievement and social behavior. The
evidenced by the popularity of performance assessment and importance of these items varies with grade level; elemen
portfolios; newly established national assessment compe tary teachers place greater importance on social behavior.
tencies for teachers (American Federation of Teachers, Fleming and Chambers (1983), in a study that analyzed
National Council on Measurement in Education, and nearly 400 teacher-developed classroom tests, made the fol
National Education Association, 1990); and the interplay lowing conclusions about the nature of classroom assess
between learning, motivation, and assessment (Brookhart, ment: (a) Short-answer questions are used most frequently;
1993, 1994; Tittle, 1994). (b) essay questions, which represent slightly more than 1%
Researchers have documented teachers' tendency to of test items, are avoided; (c) matching items are used more
award a "hodgepodge grade of attitude, effort, and achieve than multiple-choice or true-false items; (d) most test ques
ment" (Brookhart, 1991, p. 36; Cross & Frary, 1996), tions, approximately 69%, sample knowledge of terms,
although this conclusion was reached primarily on results of facts, and rules and principles; and (e) few test items mea
surveys of secondary-level teachers. It is also clear that
teachers use a variety of assessment techniques, even if
Address correspondence to James H. McMillan, P.O. Box
established measurement principles are often violated 842020, Department of Educational Studies, Virginia Common
(Cross & Frary; Frary, Cross, & Weber, 1993; Gullickson, wealth University, Richmond, VA 23284 (E-mail:
1993; Plake & Impara, 1993; Stiggins & Conklin, 1992). In jmcmillan @ edunet. soe. vcu. edu)

203

This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
204 The Journal of Educational Research

sure student ability to apply what they have learned. Stig 555 elementary, middle, and high school teachers was used.
gins and Conklin (1992) asked 24 teachers to keep ajournai Overall mean performance on the survey was 66% correct.
to reflect their assessment practices. The analysis focused Teachers did better on items related to choosing and admin
on how teachers described their assessments and which spe istering assessments and significantly worse on communi
cific issues related to their assessments were raised. The cating results. According to the authors, the results "give
researchers found that teachers were most interested in empirical evidence of the anticipated woefully low levels of
assessing student mastery or achievement and that perfor assessment competency for teachers" (p. 67). The results
mance assessment was used frequently. The nature of the also showed that teachers who had taken a measurement
assessments used in each class was coupled closely with the course performed better than did teachers who lacked this
roles that each teacher set for her students, teacher expecta background.
tions, and the type of teacher-student interactions desired. In summary, the existing literature on elementary class
Marso and Pigge (1993) summarized research that sug room assessment practices indicates that teachers probably
gests that elementary teachers place more emphasis on stu need further training to improve the quality of the assess
dents' constructed-response work samples than on tradi ments that are used. Whatever the type of question used on
tional paper-and-pencil tests. They also reported that direct assessments, few are written to tap students' higher level
observation is used in language arts assessments more than thinking skills. Appropriately, teachers appear to use a vari
in other subjects and that essay assessments, although infre ety of assessment methods. Particularly absent in the litera
quent, tend to occur most in language arts, history, and ture, however, are large-scale examinations of relationships
social studies. The vast majority of teachers use several between classroom assessment practices and grade level
types of assessments, generally placing greatest emphasis and subject matter. There is some evidence of trends across
on completion and short-answer questions. Stiggins and grade level and subject matter, as in Stiggins and Bridgeford
Bridgeford (1985) also found that elementary teachers tend (1985) and Marso and Pigge (1993), but small samples are
to stress constructed-response tests rather than objective used in this study, and the researchers did not focus on dif
types of tests, although both are used extensively. Stiggins ferences between different elementary grades.
and Bridgeford (1985) reported that the use of teacher-made Teachers' grading practices have received far more atten
objective tests is positively related to grade level, that pub tion in the literature than have assessment practices. This
lished tests tend to be used more in early grades, and that fact may be due to the salient and summative nature of
teacher-made tests are relied on more for mathematics than grades to students and parents. Grades have important con
for English assessments. The authors concluded that "grade sequences and communicate student progress to parents.
level appears to be an important variable in understanding Stiggins, Frisbie, and Griswold (1989) set the stage for
classroom assessment" (p. 281). recent research on grading by providing an analysis of cur
In a survey of 143 elementary and secondary school rent grading practices as related to recommendations of
teachers, Cizek, Fitzgerald, Shawn, and Rachor (1995) measurement specialists and newly established Standards
found that assessment practices "were highly variable and for Teacher Competence in Educational Assessment of Stu
unpredictable from characteristics such as practice setting, dents (American Federation of Teachers, National Council
gender, years of experience, grade level or familiarity with on Measurement in Education, National Education Associ
assessment policies in their school district" (p. 159). This ation, 1990). In this study, the authors interviewed and/or
finding suggests that grade level may not be as important as observed 15 teachers on 19 recommendations from the
variations found between individual teachers. Overall, the measurement literature. They found that teachers use a wide
authors concluded that "many teachers seemed to have indi variety of approaches to grading and that they want their
vidual assessment policies that reflected their own individ grades to fairly reflect both student effort and achievement,
ualistic values and beliefs about teaching" (p. 160). The as well as to motivate students. Contrary to recommended
highly variable nature of assessment practices, as is pointed practice, Stiggins and colleagues found that teachers value
out in the following paragraphs, is consistent with how student motivation and effort and set different levels of
teachers grade students. The Cizek et al. study was limited expectation on the basis of student ability. This finding is
to elementary teachers attending a university measurement consistent with an earlier study by Gullickson (1985), in
course and the use of a limited number of questions that which elementary teachers indicated that they used nontest
were restricted to 10 factors used in grading and five information, such as class discussion and student behavior,
sources of assessment-related information. Furthermore, more than test results to grade students. Gullickson also
respondents simply checked whether each factor or source reported little difference in grading practices between sci
was used, without any indication of the extent of use. ence, social science, and language arts. Given the increased
Plake and Impara (1997) summarized results from a large emphasis on school and student accountability due to high
scale survey of teachers that was structured to obtain teacher stakes testing, the relative emphasis that teachers give to
competency concerning assessment practices, by asking tests may be increasing.
teachers to indicate which of several possible answers to Brookhart (1994) conducted a comprehensive review of
assessment questions was best. A national random sample of literature on teachers' grading practices. Her review identi

This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
March/April 2002 [Vol. 95(No. 4)] 205

fied 19 studies completed since 1984. In seven studies, of giving grades, using student achievement as the sole cri
researchers investigated secondary school grading; in 11 teria for determining grades is rare. This finding is consis
studies, both elementary and secondary school grading; and tent with earlier work by Brookhart (1991), in which she
in one study, elementary school teachers. She identified pointed out that grading often consists of a hodgepodge of
three general methods of study: (a) surveys in which teach attitude, effort, and achievement. A limitation of this study
ers responded to questions concerning components includ is the small sample of elementary teachers (30) and the use
ed in grading, grade distributions, and attitudes toward of only three nonachievement factors in scenarios that par
grading issues; (b) surveys in which teachers were asked to ticipants responded to (effort/ability, missing work, and
respond to grading scenarios, asking what they would do in improvement). In addition, the students in our study were
various circumstances; and (c) qualitative methods, includ taking a university measurement course, which could result
ing interviews, observation, and document analysis. Despite in socially desirable responses or answers that reflect the
methodological and grade-level differences, the findings perspectives of the instructor.
from these studies are remarkably similar. Taken together, Brookhart's conclusion concerning the variety of factors
Brookhart came to the following conclusions: that go into grading is consistent with Cizek et al. (1995).
Cizek and colleagues also found that teachers generally use
Teachers inform students of the components used in
a variety of objective and subjective factors to maximize the
grading.
likelihood that students obtain good grades.
Teachers try hard to be fair in grading.
In summary, the literature specific to elementary teach
Measures of achievement, especially tests, are major ers' assessment and grading practices is limited. There is an
contributors to grades.
indication that teachers believe it is important to combine
Student effort and ability are used widely as compo nonachievement factors, such as effort, ability, and conduct,
nents of grades.
with student achievement to determine grades; however,
Elementary teachers rely on more informal evidence most of the studies that provide the basis for this conclusion
and observation, whereas secondary teachers use paper have been conducted with secondary-level teachers.
and-pencil achievement tests and other written evidence as Although the studies are clear in this conclusion, less is
major contributors. known about how elementary teachers decide to weigh
Teachers' grading practices vary considerably from one these nonachievement factors in determining grades and
teacher to another, especially in perceived meaning and pur whether particular factors tend to be considered together or
pose of grades, and how nonachievement factors will be whether elementary teachers separate nonachievement fac
considered.
tors, such as effort and improvement, from achievement.
Teachers' grading practices are not consistent with rec Also, in most of the surveys and other approaches in previ
ommendations of measurement specialists, especially con ous studies, researchers have asked teachers about their
founding effort with achievement. beliefs or projected behavior on the basis of scenarios.
Actual assessment and grading practice may be different.
In one study, Brookhart (1993) investigated the meaning Few researchers used a scale in which teachers indicated the
that teachers give to grades and the extent to which value actual use of different assessment and grading practices that
judgments are used in assigning grades. The results indi allowed independent recording of the emphasis of each fac
cated that low-ability students who tried hard would be tor, rather than asking teachers to indicate the relative
given a passing grade even if the numerical grade were fail emphasis of each factor by percentage. Finally, there is no
ure, although working below ability level did not affect the research on whether a relationship exists between the types
numerical grade. That is, an average or above-average stu of assessments used and grades received by students.
dent would get the grade earned, whereas a below-average In the present study, we used a large sample of elementary
student would get a break if there were sufficient effort to
school teachers to describe assessment and grading practices
justify it. Teachers were divided about how to factor in
in a way that builds on and extends previous studies. We
missing work. About half of the teachers indicated that a
addressed four specific research questions, as follows:
zero should be given, even if that meant a failure for the
semester. The remaining teachers would lower the grade, 1. What is the current state of assessment practice and
but not to a failure. The teachers' written comments grading by elementary teachers?
showed that they strived to be fair to students. Teachers 2. What are major assessment and grading components
also seemed to indicate that a grade was a form of payment that are used by elementary teachers?
to students for work completed. That is, grades were some 3. What is the relationship between assessment and grad
thing that students earned, as compensation for work com ing practices and grades given to students?
pleted. This finding suggests that teachers, either formally 4. What are the relationships between the independent
or informally, include conceptions of student effort in variables grade level and subject taught (mathematics and
assigning grades. Because teachers are concerned with stu language arts) and the dependent variables assessment and
dent motivation, self-esteem, and the social consequences grading practices?

This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
206 The Journal of Educational Research

Method assessing the cognitive level of the assessments). The aver


age exact match for the items was 46% of the teachers; 89%
Sample
of the matches were within 1 point on the 6-point scale. In
The population included all Grades 3-5 regular elemen additional items, teachers were asked to indicate the
tary teachers in seven urban/metropolitan school districts approximate grade distribution of the class.
near Richmond, Virginia (1,561 teachers in 124 schools).
Completed surveys were returned by 921 elementary teach Procedure
ers from 105 schools. Twenty of the teachers were not full
time in a regular classroom, resulting in a sample of 901. School division central administrators communicated to
The response rate by school was 88%; by teachers, 58%. teachers that they should complete the questionnaire and
that they were responsible for distribution and collection.
Instrument The questionnaire took about 15 min to complete. Teachers
were assured that their responses would be confidential. No
The purpose of the questionnaire was to document, using information was on the form that could be used to identify
closed-form items, the extent to which teachers emphasized the teacher. The surveys were completed in early February,
different assessment and grading practices. A 6-point scale, soon after the end of the first semester.
ranging from not at all to completely, was constructed to
allow teachers to indicate usage without the constraints of an
Data Analyses
ipsative scale that is commonly used in this area (e.g., per
centage each factor contributes to grades). Also, the ques The data analyses were primarily descriptive; we used
tions were worded to emphasize actual teacher behaviors in frequencies, percentages, means, medians, standard devia
relation to a specific class of students, rather than more glob tions, and graphic presentations to summarize overall find
al teacher beliefs. Teachers responded to all items once for ings and trends. We used an exploratory factor analysis to
language arts and once for mathematics. The stem for the reduce the number of components investigated within each
items was "To what extent were final first semester grades of of the three categories of items. Relationships between
students in your single class described above based on." assessment and grading practices, grades given, grade level,
The initial set of items was drawn from previous ques and participants were examined through multiple regression
tionnaires that had been reported in the literature, as well as and paired t tests.
from research on teachers' assessment and grading practices
(Brookhart, 1994; Frary, Cross, & Weber, 1993; Stiggins & Findings
Conklin, 1992). The items included factors that teachers
consider in giving grades, such as student effort, improve The descriptive results are presented first, followed by
ment, academic performance, types of assessments used, relationships. The assessment and grading practices report
and the cognitive level of the assessments (e.g., knowledge, ed were organized by the three categories of items?factors
application, reasoning). We strengthened content-related used in grading, types of assessments used, and cognitive
level of assessments.
evidence for validity for the initial draft of 47 items by ask
ing 15 elementary teachers to review the items for clarity
and completeness in covering most, if not all, assessment Descriptive Results
and grading practices used. Appropriate revisions were
made to the items, and a second pilot test with a school divi The means and standard deviations for the assessment
sion outside of the sample was used to gather additional and grading practices items, divided by categories for both
feedback on clarity, relationships among items, item language arts and mathematics, are reported in Table 1. In
response distributions, and reliability. Twenty-three teach Table 2, we present the frequency distributions of a few
ers participated in the second pilot test. Item statistics were questions to illustrate the spread of responses across the dif
used to reduce the number of items to 27. Items that showed ferent points in the scale.
a very high correlation (r > .90) or minimum variation were The means and standard deviations in Table 1 show that,
eliminated, as well as items that were weak in reliability. for this group of teachers as a whole, a few factors con
We assessed reliability by asking the teachers in the second tributed very little, if anything, to grades (i.e., disruptive
pilot test to retake the questionnaire following a 4-week student behavior, grade distributions of other teachers, per
interval. The stability estimate was done by examining the formance compared with other students, school division
percentage of matches for the items. Those items that policy about the percentage of students who may obtain
showed an exact match of less than 60% were deleted or different grades, and extra credit for nonacademic perfor
combined with other items. The revised questionnaire mance). Also, a few factors clearly contributed most, rang
included 34 items in the three categories (19 items assess ing from quite a bit to extensively?academic performance
ing different factors used to determine grades, 11 items as opposed to other factors, performance compared with a
assessing different types of assessments used, and 4 items set scale of percentage correct, and specific learning

This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
March/April 2002 [Vol. 95(No. 4)] 207

Table 1.?Means and Standard Deviations of All Items Measuring Assessment and Grading Practices for Elementary Teachers (N = 901)

Mathematics Language arts


Variable M SD M SD

Factors used in determining grad


Disruptive student behavior
Improvement of performance s
Student effort?how much stud
Ability levels of students 3.
Work habits and neatness 2.
Grade distributions of other te
Completion of homework (n
Quality of completed homewo
Academic performance, as oppo
Performance compared with ot
Performance compared with a
Performance compared with stu
Specific learning objectives m
Formal or informal school or distr
C's, D's, F's 1.50 1.15 1.50 1.14
The degree to which students pay attention, par
Inclusion of zeros for incomplete assignment
Extra credit for nonacademic performance (e.g.
Extra credit for academic performance 2.57 1.10
Effort, improvement, behavior and other
Types of assessments used in determining grades
Major examinations 3.21 1.39 3.05 1.38
Oral presentations 2.37 1.11 3.03 .88
Objective assessments (e.g., multiple choice, matching, short answer) 3.82
Performance assessments (e.g., structured teacher observations or ratings of per
a speech or paper) 2.84 1.14 3.43 .93
Assessments provided by publishers or supplied to the teacher (e.g., in instru
manuals) 3.54 1.05 3.22 1.06
Assessments designed primarily by yourself 3.63
Essay-type questions 2.42 1.15 3.39 1.03
Projects completed by teams of students 2.51 1.03 2.91 .99
Projects completed by individual students 3.06 1.24 3.59 .96
Performance on quizzes 3.93 .91 3.80 .98
Authentic assessments (e.g., real world performance tasks 2.95 1.08 2.89 1.06
Cognitive level of assessments used in determining grades
Assessments that measure student recall knowledge 3.65 .90 3.52 .86
Assessments that measure student understanding 4.46 .78 4.46 .77
Assessments that measure how well students apply what they learn 4.31 .84 4.
Assessments that measure student reasoning (higher order thinking) 3.99 .87 4

Table 2.?Percentages of Elementary Teachers' Responses to Selected Items for M

Question Not at all Very little Some Quite a bit Extensively Completely
Factors contributing to grades

Improvement of performance since the beginning of the year 13 17 38 21 7 2


Student effort?how much students tried to learn 6 14 44 27 7 2
Ability levels of students 10 13 31 24 19 4
Academic performance compared with other factors 2 3 12 29 44 10
Types of assessments used

Objective assessments 2 8 28 36 21 5
Peformance assessments 14 23 38 17 7 1

Cognitive level of assessments

Assessments that measure student reasoning 0 2 25 44 25 4

This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
208 The Journal of Educational Research

objectives mastered. The remaining factors contributed Grades Awarded


some, more accurately ranging from very little to quite a bit.
A fairly large standard deviation was reported for those The results for percentages of different grades awarded
items, showing considerable variation in the extent to which by elementary teachers are presented in Table 3. The table
the factors were used for grading. A large percentage of was broken out by grade level and subject matter, as well as
teachers used effort in vastly different ways for grading. by letter grade awarded. Percentages were estimated by
This same kind of dispersion of scores was evident in many teachers and therefore may not sum to 100%.
of the factors. In Table 2, for example, 13% of elementary Grades A, B, and C were most typically awarded by the
teachers reported using improvement not at all, whereas teachers, comprising more than 70% of the total grades
30% of the teachers responded quite a bit, extensively, or given. Grades D and F comprised less than 10% of total
completely. The extent to which ability level was used also grades given. A grade of B was most typically awarded by
was highly varied; 23% of the teachers responded not at all teachers, accounting for approximately 32 to 35% of total
and 47% responded quite a bit, extensively, or completely. grades given. Grades A and C were nearly equally distrib
Given that the grading scales in the districts used in this uted, accounting for approximately 40% of a combined
study were based on how performance compares with a set total; grades of A comprised approximately 18 to 24% of
scale of percentage correct (e.g., 94-100 A, 86-93 B), we total grades given, and grades of C comprised approximate
were surprised to find that only 65% of the teachers ly 21 to 25% of total grades. Grades of D awarded were
responded that they used this extensively or completely. between 6 and 8% of the total, whereas grades of F were
The items in which teachers indicated the types of assess less than 3% of the total grades given. The relatively large
ments used shows that they did not rely on a single kind of variability of teacher responses was illustrated by the stan
assessment. Rather, many different types of assessments dard deviations.
appear to have been used. Although objective assessments Table 4 shows a summary of results of another procedure
were employed most frequently for both mathematics and to examine variability, by comparing variability within
language arts (means of 3.75 and 3.82, respectively), per schools to variability between schools. To calculate the
formance assessments and projects were used almost as average standard deviation within schools, we used the
much in language arts as objective items (means of 3.43, responses of teachers from the same school to derive a stan
and 3.59, respectively). Assessments in mathematics dard deviation score for that school for each item. We then
included fewer performance assessments and projects averaged 105 standard deviations, one for each of 105
(means of 2.84 and 2.51, respectively). There was great schools, to result in within-schools variability. We calculat
reliance on assessments prepared by the teachers them ed between-schools variability by using the mean for each
selves, but also considerable use of assessments provided school, considering that as a single score, and then calculat
by publishers (language arts means of 3.90 and 3.22, and ing the standard deviation of the means. The results of these
mathematics means of 3.63 and 3.54). The standard devia analyses for three items, and percentage of A's awarded, are
tions with respect to types of assessments (about 1 point on summarized in Table 4. In each case, the average variation
the scale), pointed to considerable variation. within schools was greater than the variation between
Cognitive levels of assessments were very similar for schools. Even though this result was influenced by the rel
mathematics and language arts. The lowest rated assess atively low number of teachers in each school, which would
ments, in terms of use, were those that measure student increase the variation, it still suggests that teachers in the
recall knowledge. The highest rated assessment was student same school differed more, on average, than did responses
understanding, with application and reasoning in between. compared at the school level.
For the three highest rated items, the means were around 4 Figure 1 illustrates the frequency of mean percentage
on the scale (used quite a bit). mathematics A's awarded between schools. It shows that

Table 3.?Percentages of Different Semester Grades Awarded by Elementary Teachers (N = 859)

A B C D F
Grade level Math LA Math LA Math LA Math LA Math LA

3 (n = 294) 23.97 22.56 35.02 34.20 21.33 23.77 6.29 6.24 2.24 1.78
4(^ = 258) 21.39 22.27 35.50 34.73 24.52 24.35 6.67 6.22 2.90 2.33
5 (n = 205) 17.56 20.83 33.32 32.24 23.60 23.90 7.54 7.31 2.62 2.33
Mixed (n = 102) 19.56 18.52 32.18 33.08 21.91 22.48 7.11 6.07 2.54 1.75
Total 20.62 21.05 34.00 33.56 22.84 23.63 6.90 6.46 2.58 2.05

Note. Math = mathematics; LA = language arts. Percentages were estimated by the teachers.

This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
March/April 2002 [Vol. 95(No. 4)] 209

mathematics grades awarded were 12% or less for 35 ele components. A varimax rotation was used for the factor
mentary schools, whereas for 20 schools, the percentage of analyses.
mathematics A's awarded was 32%. It also shows a large The factor analysis for items used in grading (factors)
between-schools variation of the number of A's awarded. resulted in six components. There were no differences
between mathematics and language arts. The loadings of
different items are summarized in Table 5.
Data Reduction
The first component was comprised of three items that
Before examining the relationships between subject emphasized effort, ability, improvement, work habits, atten
(mathematics compared with language arts) and grade level tion, and participation. These items could be considered
(Grades 3, 4, and 5), we performed data reduction for each enablers to academic performance, important indicators to
of the major categories of items (factors, types, and cogni teachers to judge the degree to which the student has tried
tive levels) for both mathematics and language arts. The to learn and, by implication, actually learned. A second
first step in the data reduction was to eliminate items that component was defined by the two items that included
showed a floor effect with little variability. We used the questions about homework. The third component was
remaining items in the second step of the data reduction, an loaded on one item concerning grade distributions of other
exploratory factor analysis, to identify relationships among teachers. The fourth component included three items that
the items by reducing them to a few relatively independent, focused on academic performance of the student. The fifth
but conceptually meaningful, composite variables called component was loaded highly on two items that included
comparisons with other students. The sixth component
included the suggestion that borderline work and using
extra credit are related and distinct from other factors. Thus,
there appear to be six conceptually meaningful variables
Table 4.?Variation Within and Between Elementary Schools
for Selected Items (N = 105)
that elementary teachers use when grading students for both
language arts and mathematics. These variables include
Mean Mean actual performance, effort, ability and improvement, home
variation variation work, other teachers' grading, comparisons with other stu
Question within between dents, and borderline cases. Given the relatively low
emphasis on homework, comparisons with other students,
% A's awarded in mathematics 16.2 10.4
Student effort?how much students tried
other teachers' grading, and the infrequent occurrence of
to learn .92 .57 borderline cases, these results suggest that teachers concep
Assessments that measure student tualize two major ingredients?actual performance, and
reasoning .81 .42 effort, ability, and improvement. Of these two, academic
Objective assessments .97 .51
performance clearly is most important, but effort, ability,
and improvement remain as fairly important, especially for
some teachers.
The factor analysis for types of assessments used result
ed in three components. The item loadings were, for the
most part, the same for both subjects. The first component
was comprised of six items for mathematics types and four
items for language arts types, each of which described some
kind of constructed-response assessment, such as essays
(mathematics only), projects, and performance assess
ments. The second component, made up of either two or
three items, included objective assessments, quizzes (lan
guage arts only), and assessments provided by publishers.
Evidently, items provided by publishers are used in both
quizzes and objective assessments. The third component
was comprised of two items for mathematics (major exam
inations and teacher-made tests) and two items for language
arts (teacher-made tests and essays). This result suggests
that the common element in the third component was
teacher made. For mathematics and for language arts
essays, the major examinations tend to be teacher made.
The factor analysis for cognitive levels showed high
intercorrelation among the three items that suggested high
er order knowledge and skills (understanding, reasoning,

This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
210 The Journal of Educational Research

Table 5.?Factor Loadings for Elementary Teachers'Assessment and Grading Practices

Factors used in grading


Improvement of performance since the beginning of the year .777
Student effort?how much students tried to learn .809
Ability levels of students .644
Completion of homework (not graded) .819
Quality of completed homework .750
Grade distributions of other teachers .660
Academic performance, as opposed to other factors .710
Performance compared with a set scale of percentage correct
(e.g., 86-94% B) .668
Specific learning objectives mastered .683
Performance compared with other students in the class .781
Performance compared with students from previous years .741
Extra credit for academic performance .730
Effort, improvement, behavior and other nontest indictors for
borderline cases .659
Types of assessments
Oral presentations .704
Performance assessments (e.g., structured teacher observations
or ratings of performance, such as a speech or paper) .746
Essay-type questions (mathematics only) .740
Projects completed by teams of students .819
Projects completed by individual students .712
Authentic assessments (e.g., real world performance tasks) .636
Objective assessments (e.g., multiple choice, matching, short
answer) .736
Assessments provided by publishers or supplied to the teacher
(e.g., in instructional guides or manuals) .776
Major examinations (mathematics only) .691
Assessments designed primarily by yourself .721
Essay-type questions (language arts only) .672
Cognitive level
Assessments that measure student understanding .836
Assessments that measure student reasoning (higher order
thinking) .795
Assessments that measure how well students apply what they
learn .818

Note. Unless otherwise noted, factor loadings for mathematics and language arts were averaged.

and application). Teachers tended to think about these items The t test analyses showed that there were few differ
as one kind of skill, apart from recall knowledge, which did ences between language arts and mathematics assessment
not load on this analysis. and grading practices, despite the large sample size that
would have made it easy to detect statistically significant
differences. Clearly, there was more in common than there
Relationship Results was different on the basis of these two content areas. As
In the relationship analyses for subject matter and grade might be expected, differences occurred for the extent to
level, we used paired t tests and analysis of variance, which performance assessments were used (mean of 2.33
respectively, with standardized component scores for the for mathematics and 3.41 for language arts), projects com
items loading on each of the 10 components derived from pleted by individual students (mean of 3.01 for mathemat
the factor analyses, plus the percentage of A's given, as ics and 3.56 for language arts), and the use of assessments
dependent variables. We also performed a regression analy provided by publishers (mean of 3.56 for mathematics and
sis to determine if assessment and grading practices predict 3.23 for language arts). Thus, only three items in the cate
grades. Thus, there were two independent variables?sub gory types of assessments showed a difference between
ject matter, with two levels, and grade level, with three lev mathematics and language arts. When considering other
els, in the first two analyses; we used all 10 components as factors such as effort, participation, homework, and so
independent variables to predict the percentage of As forth, as well as cognitive levels, we found no difference
awarded. between the mathematics and language arts responses.

This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
March/April 2002 [Vol. 95(No. 4)] 211

Table 6.?Relationship of Grade Levels (3, 4, and 5) to Assessment Practices of Elementary Teachers
(N = 873)

No relationship Positive relationship Negative relationship

Mathematics

Effort, ability, and improvement Homework Percentage A's


Academic performance Extra credit
Teacher-made major examinations Constructed-response assessments
Higher order thinking and application Objective assessments

Language arts

Effort, ability, and improvement Homework


Academic performance Extra credit
Objective assessments Constructed-response assessments
Higher order thinking and application Teacher-made major examinations
Percentage A's

Table 7.?Factors, types of Assessments, and Cognitive Levels, as Predictor Variables of Percentage A's
Awarded for Elementary Teachers

Significant Significant
Subject positive relationship negative relationship

Mathematics (n = 714) .21 Higher order thinking and application Objective assessments
Publisher-provided items
Homework

Language arts (n = 731) .20 Constructed-response items Extra credit

We used one-way analyses of variance with Sheff? post small in both regressions, suggesting that the major predic
hoc tests, to examine the relationship between grade level tors of grades were not the weight given to different factors,
and assessment and grading practices. The results of these types of assessments, or cognitive level of assessments.
analyses are shown in Table 6, which contains a summary Given that finding, the percentage of As awarded tended to
of the components that indicate no relationship, those that increase with increased weight given to higher order think
show a positive relationship, and the single variable that ing assessments for mathematics, and constructed-response
shows a negative relationship. assessments for language arts. Negative relationships for
As with other analyses, the major finding was no differ mathematics were found with objective assessments, pub
ence between grade levels on components that were most lisher-provided items, and homework, and extra credit for
important to assessment and grading. For both language arts language arts.
and mathematics, the results showed that as grade level
increases, so does the importance of homework, extra cred Discussion
it, and constructed-response assessments. For mathematics,
the importance of objective assessments showed a positive The results of the analyses, consistent with earlier
relationship with grade level. In language arts, teacher research by Brookhart (1994) and Cizek et al. (1995), show
made major examinations contributed more in higher that most elementary teachers use a multitude of factors in
grades. The only negative relationship was found in the per grading students. The hodgepodge of factors considered
centage of A's awarded, which means that fewer As were when grading appear to be organized into six distinct com
awarded in higher grades. ponents. Academic performance is clearly the most impor
The predictive relationship between assessment and grad tant factor in grading students, as also reported by Stiggins
ing practices was examined with stepwise multiple regres and Conklin (1992), but the results of the present study
sion?one for language arts; one for mathematics, with per show that nontest performance and behavior, such as effort,
centage As awarded as the dependent variable; and the eight participation, and extra credit work, also are very important
weighted component scores as independent variables. The for many teachers, consistent with the Gullickson (1985)
results of these regressions are summarized in Table 7. study. Disruptive student behavior, grade distributions of
The multiple correlation coefficients were relatively other teachers, and norm-referenced interpretations con

This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
212 The Journal of Educational Research

tribute little to grading. A substantial percentage of elemen policy perspective, an appropriate question is whether there
tary teachers include zeros in the calculation of grades. is a good reason to permit or even encourage differences in
Because there are different ways this can be done, this find grading practices or there is a need for educational measure
ing suggests a need to explore in greater depth more specif ment experts to modify recommendations to be more con
ically how this is accomplished. sistent with the realities of teaching and the multitude of fac
Three major types of assessments are used?constructed tors that influence teacher behavior. An implication for the
response, such as projects, essays, and presentations; objec preparation of teachers is that it is important to train them to
tive assessments; and teacher-made major examinations. integrate assessment meaningfully into instruction. Another
Although objective assessments are used most frequently, implication for teacher training is to explore with students
there is also a great reliance on constructed-response types the importance of conceptualizing assessment as a process
of assessments. Teachers tend to differentiate the cognitive involving a multitude of factors, combined in unique ways
level of their assessments into two categories?recall that may differ for each individual.
knowledge and higher order thinking and application. High The factor analysis indicated that several student behav
er order thinking and application are emphasized heavily. iors considered together indicate what has been labeled here
There is a significant reliance on assessments that are as an Academic Enabling factor. These behaviors include
designed by publishers, even if most assessment is teacher student effort, participation, improvement, ability, and dis
made. This finding suggests that teachers need training in cussion. It appears that teachers use all of these behaviors,
how to evaluate the quality of their own assessments, as apart from actual academic performance, as important
well as those provided by others. indictors for determining grades. This finding has theoreti
Along with the variety of factors that go into grading, cal significance when one is conceptualizing the nature of
great variation exists within schools concerning the extent to student effort in relation to assessment. Brookhart (1997),
which teachers emphasize different factors in grading stu for example, has identified mental effort and realized stu
dents. The finding that within-school variance is greater than dent effort as a major component of a theoretical framework
between-school variance suggests that individual teacher for investigating the effects of classroom assessment. The
preferences are more important than are differences between present findings suggest that the manner in which effort is
schools in determining grading practices. This data suggest defined in this model may need to be broadened to include
that teachers vary considerably in how they weigh different some indication of improvement and ability. It may well be
factors, even within the same building, and that school and that teachers think about effort as mediated by improvement
student characteristics as a whole are less important than are and ability. That is, what is regarded as high or low effort is
individual beliefs. This finding is consistent with the highly dependent on what knowledge and skills students bring to
variable results found by Cizek et al. (1995) and McMillan the classroom and on a more general impression of ability.
and Nash (2000), confirming that an important characteristic It is clear that many teachers use academically enabling
of classroom assessment and grading practices is that they behaviors, such as participation, effort, and improvement
are highly individualized and may be unique from one extensively to determine grades, whereas other teachers do
teacher to another, even in the same school. It may be useful not use these variables very much. This finding may reveal
for teachers to discuss such differences and consider educational philosophies or approaches that give different
whether more consistency would provide students with a messages to students. For example, teachers who reward
clearer message about what is important. effort may be allowing students who are not competent, as
An implication of this variability of grading is that there well as their parents, to believe that they demonstrate need
seems to be only moderate consistency among teachers ed knowledge and skills. This action would be especially
about what is most important and how different factors are troubling if it occurred more with low-socioeconomic status
weighted. Why is this the case? One possible answer is that (SES) students, who could be rewarded for their effort to
teachers view grading as an extension of teaching to promote maintain involvement, because they might not be provided
student success in general in many areas important to with feedback that more accurately indicates their level of
schooling, including both academic achievement and aca knowledge and skills. Also, are teachers who weight effort
demic-enabling behaviors, such as responsibility, effort, more in reality "coddling" students, making it easier for
improvement, participation, and cooperation. If teachers them to obtain passing grades? Some support for this impli
would award grades solely on the basis of academic perfor cation is found in a study by Cauley and McMillan (2000),
mance, there might be much more consistency. This finding who found that middle school teachers at low-SES schools
also may explain why teachers' grading practices do not fol (determined by the percentage of students eligible for free
low established measurement principles. McMillan and and reduced-price lunches) tend to use nontest factors more
Nash (2000) found that teachers base their grading practices in grading than do teachers at low-SES schools.
on their educational philosophies and on what is best for Although few relationships were reported between assess
each student. Considering differences in styles of teaching, ment practices and grade level, greater emphasis in later
types of students, and curriculum, perhaps it is not surpris grades is placed on homework, extra credit, constructed
ing that there is high variability of grading practices. From a response assessments, objective assessments, and major

This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms
March/April 2002 [Vol. 95(No. 4)] 213

American Federation of Teachers, National Council on Measurement in


examinations. Other practices, such as effort, ability,
Education, and National Education Association. (1990). Standards for
improvement, and academic performance are emphasized teacher competence in educational measurement. Washington, DC:
the same in all three grade levels. Teachers who award more Author.
A's use fewer objective assessments, fewer publisher-pro Brookhart, S. M. (1991). Grading practices and validity. Educational Mea
surement: Issues and Practice, 10, 35-36.
vided tests, less homework, and more assessments that mea Brookhart, S. M. (1993). Teachers' grading practices: Meaning and values.
sure reasoning and application. There was no relationship Journal of Educational Measurement, 30, 123-142.
between the extent to which effort, improvement, ability, Brookhart, S. M. (1994). Teachers' grading: Practice and theory. Applied
Measurement in Education, 7, 279-301.
academic performance, homework, and extra credit were Brookhart, S. M. (1997). A theoretical framework for the role of classroom
emphasized, and percentage A's awarded. Overall, the analy assessment in motivating student effort and achievement. Applied Mea
surement in Education, 10, 161-180.
ses designed to predict assessment and grading practices fell
Cauley, K. M., & McMillan, J. H. (2000, April). Do teachers grade differ
short of much practical significance. We know that these ently in low SES middle schools? Paper presented at the annual meeting
practices do not vary much according to grade level (3-5) or of the American Educational Research Association, New Orleans, LA.
subject matter (mathematics and language arts). However, Cizek, G. J., Fitzgerald, S., M., & Rachor, R. E. (1995). Teachers' assess
ment practices: Preparation, isolation and the kitchen sink. Educational
there was considerable variation between schools in the per assessment, 3(2), 159-179.
centage of A's awarded. Further research concerning these Cross, L. H., & Frary, R. B. (1996, April). Hodgepodge grading: Endorsed
differences is warranted to explore possible reasons. by students and teachers alike. Paper presented at the annual meeting of
the National Council on Measurement in Education, New York.
The results of this study are limited by several variables, Fleming, M., & Chambers, B. (1983). Teacher-made tests: Windows on the
including demographics and location (Virginia is in the classroom. In W. E. Hathaway (Ed.), Testing in the schools: New direc
midst of a statewide assessment program consisting of all tions for testing and measurement (pp. 29-38). San Francisco: Jossey
Bass.
multiple-choice tests, with the exception of writing), using Frary, R. B., Cross, L. H., & Weber, L. J. (1993). Testing and grading prac
Grades 3, 4, and 5 with language arts and mathematics but tices and opinions of secondary teachers of academic subjects: Implica
tions for instruction in measurement. Educational Measurement: Issues
not other subjects, and the fact that the data, including
and Practice, 12(3), 23-30.
grades awarded, were based on teacher self-reports. In addi Gullickson, A. R. (1985). Student evaluation techniques and their relation
tion, the analyses were unable to tease out possible effects ship to grade and curriculum. The Journal of Educational Research,
due to the SES of each school. However, the large and com 79(2), 96-100.
Gullickson, A. R. (1993). Matching measurement instruction to classroom
prehensive nature of the sample still suggests strong exter based evaluation: Perceived discrepancies, needs, and challenges. In S.
nal validity. The self-reports were based on actual practice T. Wise (Ed.), Teacher training in measurement and assessment skills
with a specific class, not more generic beliefs, and repre (pp. 1-25). Lincoln, NB: B?ros Institute of Mental Measurement.
Marso, R. N., & Pigge, F. L. (1993). Teachers' testing knowledge, skills,
sented innercity, suburban, and rural schools. Researchers and practices. In S. T. Wise (Ed.), Teacher training in measurement and
who investigate assessment practices may find that the com assessment skills. Lincoln, NB: B?ros Institute of Mental Measurement.
ponents identified are useful categories for asking questions McMillan, J. H., & Nash, S. (2000, April). Teacher classroom assessment
and grading practices decision making. Paper presented at the annual
and relating assessment and grading practices to student meeting of the National Council on Measurement in Education, New
motivation and achievement. Orleans, LA.
Plake, B. S., & Impara, J. C. (1993). Assessment competencies of teachers:
A national survey. Educational Measurement: Issues and Practice, 12,
NOTES 10-25.
Plake, B. S., & Impara, J. C. (1997). Teacher assessment literacy: What do
This research was supported by the Metropolitan Educational Researchteachers know about assessment? In G. D. Phye (Ed.), Handbook of
Consortium, Virginia Commonwealth University, Richmond, Virginia. The classroom assessment (pp. 55-68). New York: Academic Press.
findings do not represent the views of members of the consortium. Stiggins, R. J., & Bridgeford, R. J. (1985). The ecology of classroom
The authors appreciate the reviewers' suggestions for needed improveassessment. Journal of Educational Measurement, 22(4), 271-286.
ments in earlier drafts.
Stiggins, R. J., & Conklin, N. F. (1992). In teacher's hands: Investigating the
practices of classroom assessment. Albany: State University of New York.
REFERENCES Stiggins, R. J., Frisbie, D. A., & Griswold, P. A. (1989). Inside high school:
Building a research agenda. Educational Measurement: Issues and
Airasian, P. W. (1984). Classroom assessment and educational improve Practice, 8, 5-14.
Tittle, C. K. (1994). Toward an educational psychology of assessment for
ment. Paper presented at the conference, Classroom Assessment: A Key
to Educational Excellence, Northwest Regional Educational Laboratory,teaching and learning: Theories, contexts, and validation arguments.
Portland, OR. Educational Psychologist, 29, 149-162.

This content downloaded from 128.6.218.72 on Mon, 06 Feb 2017 16:46:08 UTC
All use subject to http://about.jstor.org/terms

You might also like