Professional Documents
Culture Documents
ASSESSMENT in LEARNING 2
MODULE 2. Performance-Based Assessment
Lesson Intended Learning Outcome. At the end of the lesson, you are expected to develop a
portfolio of performance-based assessment tools that will measure students’ competencies in a
particular subject.
Engage. Designing performance-based assessment tools entails critical processes which start from the
tasks that the teacher wants to assess.
1
Example of Group Developmental Record Sheet For Student Teachers with major in Biology Sciences
Attributes Rating 5 4 3 2 1
The Interview Sheet is another observation tool which is also called the conference recording
form. Interview sheets consist of a list of questions the teacher intends to ask and space for recording the
student’s answers.
Example of Interview sheet
2
(c) Instructional supervisors are able to evaluate the strengths and weaknesses of the
academic program.
Academic Competencies
1. can understand printed materials
2. can use research and library skills
3. can use technology in preparing oral presentation
4. can use scientific method in solving problems
5. can write and speak effectively in English and Filipino
3
Oral questioning is an appropriate assessment method for actual performance when the
objectives are: (a) to assess the student’s stock knowledge and/or (b) to determine the student’s ability to
communicate ideas in coherent verbal sentences. While oral questioning is indeed an option for
assessment, several factors need to be considered when using this option. Of particular significance are
the student’ state of mind and feelings, anxiety and nervousness in making oral presentation that could
mask the student’s true ability.
Observations and self-reports need a tally sheet as device when used by the teacher to record
the frequency of student behaviours, activities or remarks. A self-checklist is a list of several
characteristics or activities presented to the subjects of a study. The students are asked to study the list
and then to place a mark opposite the characteristics that they possess or activities in which they have
engaged for a particular length of time. Observation and self-reports are useful supplementary assessment
methods when used in conjunction with oral questioning and performance tests.
4
competencies for this particular task. As in the statement of objectives using Bloom’s taxonomy, the
specific objectives also range from simple observable process to more complex observable processes, e.g.
creating an ambiance of the poem through appropriate rising and falling intonation. A competency is said
to be more complex when it consist of two or more skills.
The following competencies are simple competencies:
. Speak with well-modulated voice;
. Draw a straight line from one point to another point;
. Color a leaf with green crayon.
2. Task Designing
Learning tasks need to be carefully planned. In particular, the teacher must ensure that the
particular learning process to be observed contributes to the overall understanding of the subject or
course.
Some generally accepted standards for designing a task include:
Identifying an activity that would highlight the competencies to be evaluated, e.g. reciting a
poem writing an essay, manipulating the microscope etc.
Identifying an activity that would entail more or less the same sets of competencies. If an
activity would result in too many possible competencies, then the teacher would have difficulty
assessing the student’s competency on the task.
Finding a task that would be interesting and enjoyable for the students. Tasks such as writing
an essay are often boring and cumbersome for the students.
Example: The topic is on understanding biological diversity.
Possible Task Design: Bring the students to a pond or creek. Ask them to find all living
organisms they can find living near the pond or creek. Also, bring them to the school playground to find
as many living organisms they can. Observe how the students will develop a system for finding such
organisms, classifying the organisms and concluding the differences in biological diversity of the two
sites.
Science laboratory classes are particularly suitable for a process-oriented performance-based
assessment technique.
3. Scoring Rubrics
Rubric is a scoring scale used to assess student performance along a task-specific set of
criteria. Authentic assessments typically are criterion-referenced measures, that is, a student’s aptitude on
a task is determined by matching the student’s performance against a set of criteria to determine the
degree to which the student’s performance meets the criteria for the task. To measure student
performance against a pre-determined set of criteria, a rubric, or scoring scale, is typically created which
contains the essential criteria for the task and appropriate levels of performance for each criterion. For
example, the following rubric (scoring scale) covers the recitation portion of a task in English.
Recitation Rubric
Criteria 1 2 3
Number of X1 1-4 5-9 10-12
Appropriate Hand
Gestures
Appropriate Facial X2 Lots of inappropriate Few inappropriate No apparent inappropriate
Expression facial expression facial expression facial expression
Voice Inflection X2 Monotone voice used Can vary voice Can easily vary voice
inflection with inflection
difficulty
Incorporate proper X3 Recitation contains very Recitation has some Recitation fully captures
ambiance through little feelings feelings ambiance through feelings
5
feelings in the voice in the voice
As in the given example, a rubric is comprised of two components: criteria and levels of
performance. Each rubric has at least two criteria and at least two levels of performance. The criteria
characteristics of good performance on a task, are listed in the left-hand column in the illustrated rubric
(number of hand gestures, appropriate features, voice inflection and ambiance). Actually, as is common in
rubrics, a short hand is used for each criterion to make it fit easily into the table. The full criteria are
statements of performance such as “include a sufficient number of hand gestures” and “recitation captures
the ambiance through appropriate feelings and tone in the voice”.
For each criterion, the evaluator applying the rubric can determine to what degree the student has
met the criterion, i.g., the level of performance. In the given rubric, there are three levels of performances
for each criterion. For example, the recitation can contain lots of appropriate, few inappropriate or no
inappropriate hand gestures.
Finally, the illustrated rubric contains a mechanism for assigning a score to each project.
(Assessments and their accompanying rubrics can be used for purposes other than evaluation and, thus, do
not have to have points or grades attached to them.) In the second-to-left column a weight is assigned
each criterion. Students can receive 1, 2 or 3 points for “number of sources.” But appropriate ambiance,
more important is this teacher’s mind, is weighted three times (x3) as heavily. So, students can receive 3,
6 or 9 points (i.e., 1, 2 or 3 times 3) for the level of appropriateness in this task.
Descriptors
The rubric includes another common, but not necessary, component of rubrics – descriptors.
Descriptors spell out what is expected of students at each level of performance for each criterion. In the
given example, “lots of inappropriate facial expressions,” “monotone voice used” are descriptors. A
descriptor tells students more precisely what performance looks like at each level and how their work
may be distinguished from the work of others for each criterion. Similarly, the descriptors help the
teacher more precisely and consistently distinguish between student work.
Why Include Levels of Performance?
1. Clearer expectations
It is very useful for the students and the teacher if the criteria are identified and
communicated prior to completion of the tasks. Students know what is expected of them and
teachers know what to look for in student performance. Similarly, students better understand
what good (or bad) performance on a task looks like if levels of performance are identified,
particularly if descriptors for each level are included.
2. More consistent and objective assessment
In addition to better communicating teacher expectations, level of performance permit the
teacher to more consistently and objectively distinguish between good and bad performance,
or between superior, mediocre and poor performance, when evaluating student work.
3. Better feedback
Furthermore, identifying specific levels of student performance allows the teacher to
provide more detailed feedback to students. The teacher and the students can more clearly
recognize areas that need improvement.
Analytic Versus Holistic Rubrics
For a particular task you assign students, do you want to be able to assess how well the student
perform on each criterion, or do you want to get a more global picture of each student’s performance on
the entire task? The answer to that question is likely to determine the type of rubric you choose to create
or use: Analytic or holistic.
Analytic rubric
Most rubrics, like the Recitation rubric mentioned, are analytic rubrics. An analytic rubric
articulates levels of performance for each criterion so the teacher can assess student performance on each
criterion. Using the Recitation rubric, a teacher could assess whether a student has done a poor, good or
excellent job of “creating ambiance” and distinguish that from how well the student did on “voice
inflection.”
6
Holistic rubric
In contrast, a holistic rubric does not list separate levels of performance for each criterion.
Instead, a holistic rubric assigns a level of performance by assessing performance across multiple criteria
as a whole. For example, the analytic recitation rubric above can be turned into a holistic rubric:
3 - Excellent Speaker
. included 10-12 changes in hand gestures
. no apparent inappropriate facial expressions
. utilized proper voice inflection
. can create proper ambiance for the poem
2 – Good Speaker
. included 5-9 changes in hand gestures
. few inappropriate facial expressions
.had some inappropriate voice inflection changes
. almost creating proper ambiance
1 – Poor Speaker
. included 1-4 changes in hand gestures
. lots of inappropriate facial expressions
. used monotone voice
. did not create proper ambiance
7
Although these three levels may not capture all the variations in student performance on the
criterion, it may be sufficient discrimination for your purposes. Or, at the least, it is a place to start. Upon
applying the three levels of performance, you might discover that you can effectively group your
students’ performance in these three categories. Furthermore, you might discover that the labels of
“never”, “sometimes” and “always” sufficiently communicate to your students the degree to which they
can improve on making eye contact.
On the other hand, after applying the rubric you might discover that you cannot effectively
discriminate among student performances with just three levels of performance. Perhaps, in your view,
many students fall in between never and sometime, or between sometimes and always, and neither label
accurately captures their performance. So, at this point, you may decide to expand the number of levels of
performance to include never, rarely, sometimes, usually and always.
Makes eye contact Never Rarely Sometimes Usually Always
There is no “right” answer as to how many levels of performance there should be for a criterion in
an analytic rubric; that will depend on the nature of the task assigned, the criteria being evaluated, the
students involved and your purposes and preferences. For example, another teacher might decide to leave
off the “always” level in the above rubric because “usually” is as much as normally can be expected or
even wanted in some instances. Thus, the “makes eye contact” portion of the rubric for that teacher might
be:
Makes eye contact never rarely sometimes usually
It is recommended that fewer levels of performance be included initially because such is:
. easier and quicker to administer
. easier to explain to students (and others)
. easier to expand than larger rubrics to shrink
Level 1: Does the finished product or project illustrate the minimum expected parts or
functions? (Beginner)
Level 2: Does the finished product or project contain additional parts and functions on
top of the minimum requirements which tend to enhance the final output/ (Skilled
level)
Level 3: Does the finished product contain the basic minimum parts and functions,
have additional features on top of the minimum, and is aesthetically pleasing? (Expert
level)
8
Example 1: The desired product is a representation of a cubic prism made out of cardboard in
an elementary geometry class.
Learning Competencies: The final product submitted by the student must:
1. possess the correct dimensions (5” x 5” x 5”) – (minimum specifications)
2. be sturdy, made of durable cardboard and properly fastened together – (skilled
specifications)
3. be pleasing to the observer, preferably properly colored for aesthetic purposes –
(expert level)
Example 2: The product desired is a scrapbook illustrating the historical event called EDSA I
People Power.
Learning Competencies: The scrapbook presented by the students must:
1. contain pictures, newspaper clippings and other illustrations for the main characters
of EDSA I People Power namely: Corazon Aquino, Fidel V. Ramos, Juan Ponce
Enrile, Ferdinand E. Marcos, Cardinal Sin. – (minimum specifications)
2. contain remarks and captions for the illustrations made by the student himself for the
roles played by the characters of EDSA I People Power – (skilled level)
2. Task Designing
How should a teacher design a task for product-oriented performance-based assessment?
The design of the task in this context depends on what the teacher desires to observe as outputs of
the students. The concepts that may be associated with task designing include:
a. Complexity. The level of complexity of the project needs to be within the range of
ability of the students. Projects that are too simple tend to be uninteresting for the
students while projects that are too complicated will most likely frustrate them.
b. Appeal. The project or activity must be appealing to the students. It should be interesting
enough so that students are encourage to pursue the task to completion. It should lead to
self-discovery of information by the students.
c. Creativity. The project needs to encourage students to exercise creativity and divergent
thinking. Given the same set of materials and project inputs, how does one best present
the project? It should lead the students into exploring the various possible ways of
presenting the final output.
d. Goal-Based. Finally, the teacher must bear in mind that the project is produced in order
to attain a learning objective. Thus, projects are assigned to students not just for the sake
of producing something but for the purpose of reinforcing learning.
Example: Paper folding is a traditional Japanese art. However, it can be used as an
activity to teach the concept of plane and solid figures in geometry. Provide the students with a
9
given number of colored papers and ask them to construct as many plane and solid figures from
these papers without cutting them (by paper folding only).
3. Scoring Rubrics
Scoring rubrics are descriptive scoring schemes that are developed by teachers or other
evaluators to guide the analysis of the products or processes of students’ efforts (Brookhart, 1999).
Scoring rubrics are typically employed when a judgment of quality is required and may be used to
evaluate a broad range of subjects and activities.
3.1 Criteria Setting. The criteria for scoring rubrics are statements which identify “what really
counts” in the final output. The following are the most often used major criteria for product assessment:
Quality
Creativity
Comprehensiveness
Accuracy
Aesthetics
From the major criteria, the next task is to identify statements that would make the major
criteria more focused and objective. For instance, if we were scoring an essay on: “Three Hundred Years
of Spanish Rule in the Philippines”, the major criterion “Quality” may possess the following
substatements:
10
their previous performances. Scoring rubrics have this advantage of instantaneously providing a
mechanism for immediate feedback.
In contrast, suppose the main purpose of the oral presentation is to determine the students’
knowledge of the facts surrounding the EDSA I revolution, then perhaps a specific scoring rubric would
be necessary. A general scoring rubric for evaluating sequence of presentations may not be adequate
since, in general, events such as EDSA I (and EDSA II) differ on the situations surrounding factors (what
caused the revolutions) and the ultimate outcomes of these events. Thus, to evaluate the students’
knowledge of these events, it will be necessary to develop specific rubrics scoring guide for each
presentation.
Process of Developing Scoring Rubrics
The development of scoring rubrics goes through a process. The first step in the process entails
the identification of the qualities and attributes that the teacher wishes to observe in the students’ outputs
that would demonstrate their level of proficiency. (Brookhart, 1999). These qualities and attributes form
the top level of the scoring criteria for the rubrics. Once done, a decision has to be made whether a
holistic or an analytical rubric would be more appropriate. In an analytic scoring rubric, each criterion is
considered one by one and the descriptions of the scoring levels are made separately. This will then result
in separate descriptive scoring schemes for each of the criterion or scoring factor. On the other hand, for
holistic scoring rubrics, the collection of criteria is considered throughout the construction of each level of
the scoring rubric and the result is a single descriptive scoring scheme.
The next step after defining the criteria for the top level of performance is the identification and
definition of the criteria for the lowest level of performance. In other words, the teacher is asked to
determine the type of performance that would constitute the worst performance or a performance which
would indicate lack of understanding of the concepts being measured. The underlying reason for this step
is for the teacher to capture the criteria being measured. In particular, therefore, the approach suggested
would result in at least three levels of performance.
It is of course possible to make greater and greater distinctions between performances. For instance,
we can compare the middle level performance expectations with the best performance criterion and come
up with an above average performance criterion; between the middle level performance expectations and
the worst level of performance to come up with a slightly below average performance criteria and so on.
This comparison process can be used until the desired number of score levels is reached or until no
further distinctions can be made. If meaningful distinctions between the score categories cannot be made,
then additional score categories should not be created (Brookhart, 1999). It is better to have a few
meaningful score categories then to have many score categories that are difficult or impossible to
distinguishs.
A note of caution, it is suggested that each score category should be defined using descriptors of the
work rather than value-judgment about the work (Brookhart, 1999). For example, “Student’s sentences
contain no errors in subject-verb agreements,” is preferable over, “Student’s sentences are good.” The
phrase “are good” requires the evaluator to make a judgment whereas the phrase “no errors” is
quantifiable. Finally, we can test whether our scoring rubrics is “reliable” by asking two or more teachers
to score the same set of projects or output and correlate their individual assessments. High correlations
between the raters imply high interrater reliability. If the scores assigned by the teachers differ greatly,
then such would suggest a way to refine the scoring rubrics we have developed. It may be necessary to
clarify the scoring rubrics so that they would mean the same thing to different scorers.
Resources
Currently, there is a broad range of resources available to teachers who wish to use scoring rubrics
in their classrooms. These resources differ both in the subject that they cover and the level that they are
designed to assess. The examples provided below are only a small sample of the information that is
available.
For K-12 teachers the State of Colorado (1998) has developed an on-line set of general, holistic
scoring rubrics that are designed for the evaluation of various writing assessments. The Chicago Public
Schools (1999) maintain an extensive electronic list of analytic and holistic scoring rubrics that span the
broad array of subjects represented throughout K-12 education. For the mathematics teachers, Danielson
has developed a collection of reference books that contain scoring rubrics that are appropriate to the
11
elementary, middle school and high school mathematics classrooms (1997a, 1997b; Danielson &
Marquez, 1998).
Resources are also available to assist college instructors who are interested in developing and using
scoring rubrics in their classrooms. Kathy Schrock’s Guide to Educators (2000) contain electronic
materials for both the pre-college and the college classroom. In The Art and Science of Classroom
Assessment: The Missing Part of Pedagogy, Brookhart (1999) provides a brief, but comprehensive review
of the literature on assessment in the college classroom. This includes a description of scoring rubrics and
why their use is increasing in the college classroom. Moskal (1999) has developed a web site that
contains links to a variety of college assessment resources, including scoring rubrics.
The resources described represent only a fraction of those that are available. The ERIC
Clearinghouse of Assessment and Evaluation [ERIC/AE] provides several additional useful web sites.
One of these, Scoring Rubrics – Definitions &Constructions (2000b), specifically addresses questions that
are frequently asked with regard to scoring rubrics. This site also provides electronic links to web
resources and bibliographic references to books and articles that discuss scoring rubrics.
12