Professional Documents
Culture Documents
Introduction
This lesson will focus on the different principle or criteria and it will provide
suggestions for practical steps you can take to keep the quality of your assessment
high.
bjectives
After completing this lesson, you are expected to:
Before moving on the different criteria, let us first answer the question, “what
is high-quality assessment?
Sound assessment begins with clear and appropriate learning targets. Learning
target is defined as a statement of student performance that includes both a
description of what students should know, understand, and be able to do at the end of
the unit of instruction and as much as possible about the criteria for judging the level
of performance.
Example 1. At the end of the lesson, the students can identify at least five
organs of the digestive system.
Example 2. After the discussion, 80% of the students can add similar fractions
with at least 70 % accuracy.
According to Stiggins and Conklin (1992), there are five types of learning
targets. As summarized in Table 1, these targets are not presented as a hierarchy or
order. None of these is more important than any other, rather each simply represents
types of targets that can be identified and used for assessment.
The types of learning targets presented provide a start to identifying the focus of
instruction and assessment, but you will find other sources that are more specific
about learning targets such as the Bloom’s Taxonomy of Objectives (Table 2).
Table 2. Taxonomy of Cognitive Objectives
Original Revised Illustrative Verbs
Blooms Taxonomy Blooms Taxonomy
Knowledge Remember Names, lists, recalls, defines, describes
The different learning targets can also be related to the three domains of
development of the learners, namely: cognitive, psychomotor, and affective. These
three domains are commonly used by the teachers in setting instructional objectives.
Different Learning Targets Domain of Development
1. Knowledge Cognitive
2. Reasoning Cognitive
3. Skills Psychomotor
4. Products Psychomotor
5. Affect Affective
Selection type
(True/False, Multiple choice, Matching
Type, Identification)
Essay Restricted –Response Reasoning
Extended-Response
Performance-based Process-oriented Skills
(Presentations, Athletics, Demonstration, Products
Exhibitions)
Product-oriented (Papers, Projects)
Oral Questioning Oral examinations, Conferences, Interviews Knowledge
Reasoning
Affect
Observation Informal Skills
Products
Formal Affect
Self-Report Attitude Survey, Sociometric Devices, Affect
Questionnaires, Survey Products
PRINCIPLE 3. VALIDITY
The concept of validity is very familiar and such is the heart of any type of
high-quality assessment. Broadly defined, it refers to the characteristic that refers to
the appropriateness of the inferences, uses and consequences that result from the
assessment. The more popular definition for this concept states that “it is the extent
to which a test measures what it is supposed to measure”. Although this notion is
important, validity is more than that.
How do we determine the validity of the assessment method or the test that we
use?
Number of 11 7 3 3 3 3 30
items
% of items 70% 30% 100%
The table is completed by simply indicating the number of items and the
percentage of items from each type of learning target. For example, if the topic is
vertebrates, you might have reptiles as one topic. If there were ten items for reptiles
and N (total number items ) = 30, then 33.33% would be included in that table under
% of items. The rest of the table is completed by your judgment as to whether which
learning targets will be assessed, what area of the content will be sampled, and how
much of the assessment is measuring each target. In this process, evidence of
content-related validity is established.
Criterion-related evidence
Construct-related evidence
PRINCIPLE 4. RELIABILITY
Like validity, the term reliability has been used for so many years to describe
an essential characteristic of sound assessment. Reliability is concerned with the
consistency, stability, and dependability of the results. In other words, a reliable
result is one that shows similar performance at different times or under different
conditions.
Suppose Mrs. Caparas is assessing her students’ addition and subtraction skills,
and she decided to give the students a twenty-point quiz to determine their skills. She
examines the results but wants to be sure about the level of performance before
designing appropriate instruction. So she gives another quiz two days later on the
same addition and subtraction skills. The results are as follows:
Addition Subtraction
Quiz 1 Quiz 2 Quiz 1 Quiz 2
(2 days later) (2 days later)
Morgan 18 16 13 20
Ashley 10 12 18 10
Z-lo 9 8 8 14
Lexy 16 15 17 12
Mia 19 18 19 11
As you can see from the table, the scores for addition are fairly consistent.
Students who scored high on the first quiz also scored high on the second quiz, and
students scored low did so on both quizzes. Consequently, the results for addition are
reliable. For subtraction, on the other hand, there is considerable change in
performance from the first to the second quiz. Students scoring low on the first quiz
scored high on the second. For subtraction, then, the results are less reliable because
they are not consistent. The scores contradict one another.
The teacher’s goal is to use the quiz to accurately determine the defined skill.
In the case of addition, she can get a fairly accurate picture with an assessment that is
reliable. For subtraction, on the other hand, she cannot use these results alone to
estimate the students’ real or actual skill. More assessments are needed before she
can be confident that scores are reliable and thus provide a dependable result.
But even if the scores in addition are reliable; they are not without some degree
of error. In fact, all assessments have error; they are never perfect measure of the trait
or skill. The concept of error in assessment is critical to understanding reliability.
Conceptually, whenever we see or measure something, we get an observed score or
result. This observed score is a product of what the true or real ability or skill is plus
some degree of error:
So what are the sources of error in assessment that may affect test reliability?
Figure 3 summarizes the different sources of assessment error.
Internal error
Health
Mood
Motivation
Test-taking skills
Anxiety
Fatigue
General ability
External error
Directions
Luck
Item ambiguity
Heat in room, lighting
Sampling of items
Observer differences
Test interruptions
Scoring
Observer bias
In the previous example given, what Mrs. Caparas did is called a test-retest
method of establishing reliability. That is giving the same test twice the same
students at two different points in time. Other methods include parallel-forms
method and alternate-forms reliability estimates. Parallel forms of a test exist when,
for each form of the test, the means and the variances of observed test scores are
equal. Alternate forms are simply different versions of a test that have been
constructed so as to be parallel, in which the two forms of the tests are typically
designed to be equivalent with respect to variables such as content and level of
difficulty.
Other methods that require statistical procedures are the Split-half reliability
estimates, the Spearman-Brown formula, the Kuder-Richardson formulas, and
Coefficient alphas.
PRINCIPLE 5. FAIRNESS
A fair assessment is one in which it is clear what will and will not be tested
and your objective is not to fool or trick students or to outguess them on assessment.
Rather, you need to be very clear and specific about the learning target – what is to be
assessed and how it will be scored.
Opportunity to learn
This means that students know what to learn and then are provided ample
time and appropriate instruction. It is usually not sufficient to simply tell students
what will be assessed and the test them. You must plan instruction that focuses
specifically on helping students understand, providing students with feedback on their
progress, and giving students the time they need to learn.
Prerequisite knowledge and skills
Avoiding stereotypes
Stereotypes are judgments about how group of people will behave based on
characteristics such as gender, race, socioeconomic status and physical appearance.
Though it is impossible to avoid stereotypes completely because of our values, beliefs
and preferences, we can control the influence of these prejudices.
As a teacher, one should be asking herself these questions. How will assessment
affect student motivation? Will students be more or less likely to be meaningfully
involved? Will their motivation be intrinsic or extrinsic? How will the assessment
affect my teaching? What will the parents think about my assessment? These
questions must be answered clearly. It is important to remember that the nature of
classroom assessment has important consequences for teaching and learning.
The most direct consequence of assessment is that students learn and study in
a way consistent with your assessment task. If your assessment is multiple choice to
determine the students’ knowledge of specific facts, students will tend to memorize
information. Assessment also has clear consequences on students’ motivation. If
students realize that you always give essay test, and that they cannot score with the
test because of your rather unrealistic standards of grading the essay , then they will
not exert effort in reviewing the lessons (though a lot of stuff need to be memorized).
But if the students know what will be assessed and how it will be scored, and if they
believe that the assessment will be fair, they are likely to be motivated to learn.
Finally, the student-teacher relationship is influenced by the nature of assessment
such as when teachers construct assessments carefully and provide feedback to
students, the relationship is strengthened.
Positive consequences on teachers.
High quality assessments are practical and efficient. Because time is a limited
commodity for teachers, factors like familiarity with the method, time required,
complexity of administration, ease of scoring, ease of interpretation and cost should
be considered.
This includes knowing the strengths and limitations of the method, how to
administer, how to score and interpret responses. Otherwise, teachers risk time and
resources for questionable results.
Time required
Gather only as much information as you need for the decision. The time
required should include how long it takes to construct the assessment, and how long it
takes to score the results. Thus, if you plan to use a test format (like multiple choice)
over and over for different groups of students, it is efficient to put in considerable
time preparing the assessment as long as you can use many of the same test items
each year of the semester.
Complexity of administration
The directions and procedures for administration should be clear and that
little time and efforts are needed. Assessments that require long and complicated
instructions are less efficient and because of probable students’ misunderstanding,
reliability and validity are affected.
Ease of scoring
It is obvious that objective tests are easier to score than other methods. In
general use the easiest method of scoring appropriate to the method and purpose of
the assessment. Scoring performance-based assessment, essays and papers are more
difficult to score so it is more practical to use rating scales and checklists rather than
writing extended individualized evaluations.
Ease of interpretation
Objective tests that report a single score are usually easiest to interpret, and
individualized written comments are more difficult to interpret. You can share to
students key and other materials that provide meaning to different scores or grades.
Cost
Like other practical aspects, it is best to use the most economical assessment.
However, it would be certainly unwise to use a more unreliable or invalid instrument
just because it costs less.