You are on page 1of 14

Lesson 2

PRINCIPLES OF HIGH QUALITY ASSESSMENT

Introduction

Formulating instructional objectives or learning targets is identified as the first


step in conducting both the process of teaching and evaluation. Once you have
determined your objectives or learning targets, or have answered the question
“what to assess”, you will probably be concerned with answering the question
“how to assess? At this point, it is important to keep in mind several criteria that
determine the quality and credibility of the assessment methods that you chose.

This lesson will focus on the different principle or criteria and it will provide
suggestions for practical steps you can take to keep the quality of your assessment
high.

bjectives
After completing this lesson, you are expected to:

enumerate the different principles of high quality assessment;


differentiate the different learning targets and the appropriate assessment
methods;
evaluate the appropriateness of the assessment method to the learning target;
explain the importance of test validity and reliability;
explain the methods of establishing validity evidences;
discuss the ways of improving test reliability ;
present the positive consequences of assessment on students by giving
insight;
 analyze fairness in assessment practice; and
explain the importance of test practicality and efficiency
High quality assessment

Before moving on the different criteria, let us first answer the question, “what
is high-quality assessment?

Until recently, test validity, reliability and efficiency describe quality of


classroom assessment, and this has put emphasis on highly technical, statistically
sophisticated standards. In most classrooms however, such technical qualities have
little relevance because the purpose of assessment is different. This does not mean
underscoring the importance of validity and reliability of the assessment methods
rather high quality assessment adds other criteria as well.

High quality assessment is not only concerned on the detailed inspection of


the test itself; rather focuses on the use and consequences of the results and what
assessment get students to do. The criteria of high quality assessment which will be
discussed in this lesson in detail are presented on a concept map in Figure 1.

Figure 1. Criteria for ensuring high quality classroom assessments


PRINCIPLE 1. CLEAR AND APPRORIATE LEARNING TARGETS

Sound assessment begins with clear and appropriate learning targets. Learning
target is defined as a statement of student performance that includes both a
description of what students should know, understand, and be able to do at the end of
the unit of instruction and as much as possible about the criteria for judging the level
of performance.

Example 1. At the end of the lesson, the students can identify at least five
organs of the digestive system.

Example 2. After the discussion, 80% of the students can add similar fractions
with at least 70 % accuracy.

Types and sources of learning targets

According to Stiggins and Conklin (1992), there are five types of learning
targets. As summarized in Table 1, these targets are not presented as a hierarchy or
order. None of these is more important than any other, rather each simply represents
types of targets that can be identified and used for assessment.

Table 1. Types of Learning Targets


Learning Target Definition
Knowledge and simple Student mastery of substantive subject matter and
understanding procedure

Deep understanding and Student ability to reason and solve problems


reasoning
Skills Student ability to demonstrate achievement-related skill
and performing psychomotor behaviors

Products Student ability to create achievement-related products


such as written reports, oral presentation and art products

Affects Student attainment of affective states such as attitudes,


values, interest and self-efficacy

The types of learning targets presented provide a start to identifying the focus of
instruction and assessment, but you will find other sources that are more specific
about learning targets such as the Bloom’s Taxonomy of Objectives (Table 2).
Table 2. Taxonomy of Cognitive Objectives
Original Revised Illustrative Verbs
Blooms Taxonomy Blooms Taxonomy
Knowledge Remember Names, lists, recalls, defines, describes

Comprehension Understand Explains, rephrase, summarizes,


converts, interprets

Application Apply Demonstrates, modifies, produces,


solves, applies

Analysis Analyze Distinguishes, compares, differentiates,


classifies

Synthesis Evaluate Generates, combines, constructs,


formulates, proposes

Evaluation Create (synthesis) Justifies, criticizes, concludes,


supports, defends, confirms

Each level of the taxonomy represents an increasingly complex type of


cognition, with knowledge level (remember) considered as the lowest level. However
the remaining five levels are referred to as “intellectual abilities and skills. Though
this categorization of cognitive tasks was created more than 50 years ago, and other
more contemporary frameworks were offered, the taxonomy is still valuable in
providing a comprehensive list of possible learning objectives with clear action verbs
that operationalize the learning targets.

The different learning targets can also be related to the three domains of
development of the learners, namely: cognitive, psychomotor, and affective. These
three domains are commonly used by the teachers in setting instructional objectives.
Different Learning Targets Domain of Development
1. Knowledge Cognitive
2. Reasoning Cognitive
3. Skills Psychomotor
4. Products Psychomotor
5. Affect Affective

PRINCIPLE 2. APPROPRIATENESS OF ASSESSMENT METHODS


Many different approaches or methods are used to assess students but your
choice will greatly depend on the match of the learning target and the method. The
different methods of assessment are categorized according to the nature and
characteristics of each method. There are four major categories: selected-response,
constructed-response, teacher observation, and self-report.
I. Selected response
a. Multiple choice
b. Binary choice ( e.g., true/false)
c. Matching Type
d. Identification
II. Constructed response
a. Brief constructed response
1. Short answer
2. Completion/Fill in the blanks
3. Label a diagram
b. Performance-based tasks
1. Products (paper, project, poem, portfolio, reflection, journal, graph)
2. Skills (speech, demonstration, debate recital)
c. Essay items
1. Restricted-response
2. Extended-response
d. Oral questioning
1. Informal questioning
2. Examinations
3. Interviews
III. Teacher Observation
a. Informal
b. Formal
IV. Self-Report
a. Attitude survey
b. Questionnaires
c. Inventories

McMillan(2003) introduced another category of assessment methods and its


relative advantage when it comes to assessment of learning targets. His attempt to
appropriately match learning targets with the assessment methods is summarized on
the table that follows. He suggested that a teacher has to select assessment on the
basis of what will provide the fairest indication of student achievement for all
students.

Assessment Examples Learning


Methods Target
Measured
Objective Supply type Knowledge
(Completion, Short Answer, Enumeration)

Selection type
(True/False, Multiple choice, Matching
Type, Identification)
Essay Restricted –Response Reasoning
Extended-Response
Performance-based Process-oriented Skills
(Presentations, Athletics, Demonstration, Products
Exhibitions)
Product-oriented (Papers, Projects)
Oral Questioning Oral examinations, Conferences, Interviews Knowledge
Reasoning
Affect
Observation Informal Skills
Products
Formal Affect
Self-Report Attitude Survey, Sociometric Devices, Affect
Questionnaires, Survey Products

PRINCIPLE 3. VALIDITY

The concept of validity is very familiar and such is the heart of any type of
high-quality assessment. Broadly defined, it refers to the characteristic that refers to
the appropriateness of the inferences, uses and consequences that result from the
assessment. The more popular definition for this concept states that “it is the extent
to which a test measures what it is supposed to measure”. Although this notion is
important, validity is more than that.

Validity is concerned with the soundness, trustworthiness, or legitimacy of the


inferences made on the basis of the obtained scores. In other words, is the
interpretation made from the test result reasonable? Is the information gathered the
right kind of evidence for the decision to be made or the intended use? How sound is
the interpretation of the information. The decision for example to pass or to fail a
student, in view of the attainment of some pre-specified standard or criteria should be
examined in as far as its validity is concerned. It is not the test that we determine if
valid, but it is the validity of the inference, the conclusions and consequences that
may arise from our assessment that is being established.

How do we determine the validity of the assessment method or the test that we
use?

Validity is always determined by professional judgment. This judgment is made


by the user of the information (i.e. the teacher for classroom assessment).
Traditionally, validity comes from three evidences: content-related, criterion-related
and construct-related. How can teachers use these evidences, as well consequences
and uses, to make an overall judgment about the degree of validity of the assessment.
The contemporary idea of validity is unitary, with the view that there are different
types of evidence to use in determining validity, rather than the traditional view that
there are different types of validity.

Content –related evidence

Suppose you wanted to test for everything sixth-grade students learn in a


four-week unit on insects. Can you imagine how long the test would be and how
much time the students would take to complete the test? What you do is to select a
sample of what has been taught, and use this student achievement as basis for judging
that the students demonstrate knowledge about the unit. Adequate sampling of course
is determined by your professional judgment. This can be done by reviewing the
match between the intended interferences and what is on the test. This process begins
with clear learning targets and prepares a table of specification for these targets. The
Table of Specification or the test blueprint is a two-way grid that shows the content
and types of learning targets. A sample Table of Specification for Science is
presented in Table 3.

Table 3. Table of Specification (TOS) in Science-6


Levels of Thinking Skills Total % of
Contents no. of items
Remember Understand Apply Analyze Evaluate Create items
Reptiles ** 4 2 1 1 1 1 10 33.33
Mammals *** 6 4 1 1 1 1 14 46.67
Birds * 1 1 1 1 1 1 6 20.00
Item placement 1-11 12-18 19-21 22-24 25-27 28-30

Number of 11 7 3 3 3 3 30
items
% of items 70% 30% 100%

The table is completed by simply indicating the number of items and the
percentage of items from each type of learning target. For example, if the topic is
vertebrates, you might have reptiles as one topic. If there were ten items for reptiles
and N (total number items ) = 30, then 33.33% would be included in that table under
% of items. The rest of the table is completed by your judgment as to whether which
learning targets will be assessed, what area of the content will be sampled, and how
much of the assessment is measuring each target. In this process, evidence of
content-related validity is established.

Another consideration related to this type of evidence is the extent to which an


assessment can be said to have instructional validity or concerned with the match
between what is taught and what is assessed. One way to check this is to examine the
Table of Specification after teaching a unit to determine if the emphasis in different
areas is consistent with what was emphasized in class. For example, if you
emphasized knowledge in teaching a unit (e.g., facts, definition of terms, places, dates
and names), it would not be logical to test for reasoning and make inferences about
the knowledge students learned in the class.

Criterion-related evidence

This is established by relating an assessment to some other valued measure


(criterion) that either provides an estimate of current performance (concurrent
criterion-related evidence) or predicts future performance (predictive
criterion-related evidence). Classroom teachers do not conduct formal studies to
obtain correlation coefficients that will provide evidence of validity, but the principle
is very important for teachers to employ. The principle is that when you have two or
more measures of the same thing, and these measures provide similar results, then
you have established criterion-related evidence. For example, if your assessment of a
student’s skills in using a microscope through observation coincides with the
student’s score on a quiz that tests steps in using microscope, then you have
criterion-related evidence that your inference about the skill of this student is valid.

Similarly, if you are interested in the extent to which preparation by your


students, as indicated by scores on a final exam in mathematics predicts how well
they will do next year, then you can examine the grades of previous students and
determine informally if students who scored high on your final exam are getting high
grades and students who scored low on your final exam are obtaining low grades. If a
correlation is found, then an inference about predicting how your students will
perform, based on their final exam is valid, particularly, predictive criterion-related
validity.

Construct-related evidence

A construct refers to an unobservable trait or characteristics that a person


possesses, such as intelligence, reading comprehension, honesty, self-concept, attitude,
reasoning, learning style and anxiety. These are not measured directly rather the
characteristic is constructed to account for behavior that can be observed. Three
types of construct-related evidence are theoretical, logical and statistical.

Theoretical explanation is to define the characteristic in such a way that its


meaning is clear and not confused with any other constructs like “ what is attitude or
‘ how much students enjoy reading”. Logical analyses on the other hand can be done
by asking the students to comment on what they were thinking when they answered
the questions, or compare the scores of groups who, as determined by other criteria,
should respond differently. Finally, statistical procedures can be used to correlate
scores from measures of the construct from other measures of the same construct and
measures of similar, but different construct. For example, self-concept of academic
ability scores from one survey should be related to another measure of the same thing
(convergent construct-related evidence) but less related to measures of self-concept of
physical ability (divergent construct-related evidence).

PRINCIPLE 4. RELIABILITY

Like validity, the term reliability has been used for so many years to describe
an essential characteristic of sound assessment. Reliability is concerned with the
consistency, stability, and dependability of the results. In other words, a reliable
result is one that shows similar performance at different times or under different
conditions.

Suppose Mrs. Caparas is assessing her students’ addition and subtraction skills,
and she decided to give the students a twenty-point quiz to determine their skills. She
examines the results but wants to be sure about the level of performance before
designing appropriate instruction. So she gives another quiz two days later on the
same addition and subtraction skills. The results are as follows:
Addition Subtraction
Quiz 1 Quiz 2 Quiz 1 Quiz 2
(2 days later) (2 days later)
Morgan 18 16 13 20
Ashley 10 12 18 10
Z-lo 9 8 8 14
Lexy 16 15 17 12
Mia 19 18 19 11

As you can see from the table, the scores for addition are fairly consistent.
Students who scored high on the first quiz also scored high on the second quiz, and
students scored low did so on both quizzes. Consequently, the results for addition are
reliable. For subtraction, on the other hand, there is considerable change in
performance from the first to the second quiz. Students scoring low on the first quiz
scored high on the second. For subtraction, then, the results are less reliable because
they are not consistent. The scores contradict one another.

The teacher’s goal is to use the quiz to accurately determine the defined skill.
In the case of addition, she can get a fairly accurate picture with an assessment that is
reliable. For subtraction, on the other hand, she cannot use these results alone to
estimate the students’ real or actual skill. More assessments are needed before she
can be confident that scores are reliable and thus provide a dependable result.

But even if the scores in addition are reliable; they are not without some degree
of error. In fact, all assessments have error; they are never perfect measure of the trait
or skill. The concept of error in assessment is critical to understanding reliability.
Conceptually, whenever we see or measure something, we get an observed score or
result. This observed score is a product of what the true or real ability or skill is plus
some degree of error:

Observed score = True score + error

Reliability is directly related to error. It is not a matter of all or none, as if


some results are reliable and others unreliable. Rather, for each assessment there is
some degree of error. Thus we think in terms of low, moderate, or high reliability. It
is important to remember that error can be positive or negative. That is, the observed
score can be higher or lower than the true score depending on the nature of the error.
For example, if the student is sick, tired, in bad mood or distracted, the score may
have negative error and underestimate the true score. On the other hand, if the
student is happy, or in good health, then most likely scores would be higher.

So what are the sources of error in assessment that may affect test reliability?
Figure 3 summarizes the different sources of assessment error.
Internal error

 Health
 Mood
 Motivation
 Test-taking skills
 Anxiety
 Fatigue
 General ability

Actual or true knowledge,


Understanding, Reasoning, Assessment Observed
Skills, Products or Affects score

External error

 Directions
 Luck
 Item ambiguity
 Heat in room, lighting
 Sampling of items
 Observer differences
 Test interruptions
 Scoring
 Observer bias

Figure 3. Possible sources of assessment error

Methods of establishing reliability evidences

In the previous example given, what Mrs. Caparas did is called a test-retest
method of establishing reliability. That is giving the same test twice the same
students at two different points in time. Other methods include parallel-forms
method and alternate-forms reliability estimates. Parallel forms of a test exist when,
for each form of the test, the means and the variances of observed test scores are
equal. Alternate forms are simply different versions of a test that have been
constructed so as to be parallel, in which the two forms of the tests are typically
designed to be equivalent with respect to variables such as content and level of
difficulty.
Other methods that require statistical procedures are the Split-half reliability
estimates, the Spearman-Brown formula, the Kuder-Richardson formulas, and
Coefficient alphas.

To enhance reliability, the following suggestions are to be considered:


Use sufficient number of items or tasks. (Other things being equal, longer
tests are more reliable).
Use independent raters or observers who provide similar score on the same
performances.
Construct items and tasks that clearly differentiate students on what is being
assessed.
Make sure the assessment procedures and scoring are as objective as
possible.
Continue assessment until results are consistent.
Eliminate or reduce the influence of extraneous events or factors
Use shorter assessments more frequently than fewer but long assessment

PRINCIPLE 5. FAIRNESS

A fair assessment is one that provides all students an equal opportunity to


demonstrate achievement and yields scores that are comparably valid from one person
or group to another. If some students have an advantage over others because of
factors unrelated to what is being taught, then the assessment is not fair. Thus,
neither the assessment task nor scoring is differentially affected by race, gender,
ethnic background, or other unrelated to what is being assessed.

The following criteria represent potential influences that determine whether or


not an assessment is fair.

Student knowledge of learning targets and assessment

A fair assessment is one in which it is clear what will and will not be tested
and your objective is not to fool or trick students or to outguess them on assessment.
Rather, you need to be very clear and specific about the learning target – what is to be
assessed and how it will be scored.

Opportunity to learn

This means that students know what to learn and then are provided ample
time and appropriate instruction. It is usually not sufficient to simply tell students
what will be assessed and the test them. You must plan instruction that focuses
specifically on helping students understand, providing students with feedback on their
progress, and giving students the time they need to learn.
Prerequisite knowledge and skills

It is unfair to assess students on things that require prerequisite knowledge or


skills that they do not possess. For example, you want to test math reasoning skills.
Your questions are based on short paragraphs that provide needed information. In
this situation, math reasoning skills can be demonstrated only if students can read and
understand the paragraphs. Thus, reading skills are prerequisites. If students do
poorly on the test, their performance may have more to do with a lack of reading
skills than with math reasoning.

Avoiding stereotypes

Stereotypes are judgments about how group of people will behave based on
characteristics such as gender, race, socioeconomic status and physical appearance.
Though it is impossible to avoid stereotypes completely because of our values, beliefs
and preferences, we can control the influence of these prejudices.

Avoiding bias in assessment task and procedures

Bias is present if the assessment distorts performance because of the


students’ ethnicity, gender, race, religious background and so on. Bias appears in two
forms: offensiveness and unfair penalization.

PRINCIPLE 6. POSITIVE CONSEQUENCES

As a teacher, one should be asking herself these questions. How will assessment
affect student motivation? Will students be more or less likely to be meaningfully
involved? Will their motivation be intrinsic or extrinsic? How will the assessment
affect my teaching? What will the parents think about my assessment? These
questions must be answered clearly. It is important to remember that the nature of
classroom assessment has important consequences for teaching and learning.

Positive consequences on students.

The most direct consequence of assessment is that students learn and study in
a way consistent with your assessment task. If your assessment is multiple choice to
determine the students’ knowledge of specific facts, students will tend to memorize
information. Assessment also has clear consequences on students’ motivation. If
students realize that you always give essay test, and that they cannot score with the
test because of your rather unrealistic standards of grading the essay , then they will
not exert effort in reviewing the lessons (though a lot of stuff need to be memorized).
But if the students know what will be assessed and how it will be scored, and if they
believe that the assessment will be fair, they are likely to be motivated to learn.
Finally, the student-teacher relationship is influenced by the nature of assessment
such as when teachers construct assessments carefully and provide feedback to
students, the relationship is strengthened.
Positive consequences on teachers.

Just as students learn depending on the assessment, teachers tend to teach to


the test. Thus, if assessment calls for memorization of facts, the teacher tends to
teach lots of facts; if the assessment requires reasoning, then the teacher structures
exercises and experiences that get students to think. Assessment may also influence
how you are perceived by others. Are you comfortable with school administrators
and parents reviewing and critiquing your assessments? What about the views of
other teachers? How do your assessments fit with what you want to be as a
professional? Thus, like students, teachers are affected by the nature of the
assessments they give their students.

PRINCIPLE 7. PRACTICALITY AND EFFICIENCY

High quality assessments are practical and efficient. Because time is a limited
commodity for teachers, factors like familiarity with the method, time required,
complexity of administration, ease of scoring, ease of interpretation and cost should
be considered.

Familiarity with the method

This includes knowing the strengths and limitations of the method, how to
administer, how to score and interpret responses. Otherwise, teachers risk time and
resources for questionable results.

Time required

Gather only as much information as you need for the decision. The time
required should include how long it takes to construct the assessment, and how long it
takes to score the results. Thus, if you plan to use a test format (like multiple choice)
over and over for different groups of students, it is efficient to put in considerable
time preparing the assessment as long as you can use many of the same test items
each year of the semester.

Complexity of administration

The directions and procedures for administration should be clear and that
little time and efforts are needed. Assessments that require long and complicated
instructions are less efficient and because of probable students’ misunderstanding,
reliability and validity are affected.

Ease of scoring

It is obvious that objective tests are easier to score than other methods. In
general use the easiest method of scoring appropriate to the method and purpose of
the assessment. Scoring performance-based assessment, essays and papers are more
difficult to score so it is more practical to use rating scales and checklists rather than
writing extended individualized evaluations.
Ease of interpretation

Objective tests that report a single score are usually easiest to interpret, and
individualized written comments are more difficult to interpret. You can share to
students key and other materials that provide meaning to different scores or grades.

Cost

Like other practical aspects, it is best to use the most economical assessment.
However, it would be certainly unwise to use a more unreliable or invalid instrument
just because it costs less.

You might also like