Basic Concepts and Principles in Educational Assessment: Perform?"

ProfEd 221—Assessment in Learning 1
Basic Concepts and Principles in observations, and student self-report. Assessment

Educational Assessment answers the question: “How well does the individual
perform?” [4]
(Weeks 2-3
COURSE OUTCOME (CO From an instructional standpoint, assessment may

be de ned as a systematic process of determining
CO1: Demonstrate mastery of basic concepts and the extent to which instructional objectives are
principles in assessment and test development achieved by students. The process includes both
measurement procedures (e.g., tests) and
INTENDED LEARNING OUTCOMES (ILO nonmeasurement procedures (e.g., informal
observation) for describing changes in student
1. De ne educational assessment. performance as well as value judgments
concerning the desirability of the changes (see
2. Recognize basic concepts and principles in Figure below).
assessment of learning like testing, measurement,
evaluation and assessment.
3. Characterize testing, measurement, evaluation,

and assessment.
4. Differentiate standardized and classroom

assessments
Assessment refers to all the means used in schools to

measure student performance. These include quizzes
and tests, written evaluations, and grades. Student
evaluation usually focuses on academic achievement, Figure 1. The assessment process [4]
but many schools also assess behaviors and attitudes
Assessments are critical because teaching involves many Formal assessments: Objective1 and rigorous
kinds of judgments—decisions based on values methods for obtaining information about student
learning, such as tests, quizzes, book reports, and
Is this software appropriate for my students assigned in-class presentations [9]
Will Juan do better if he repeats Grade 1 Informal assessments: Observations teachers make
of students while they are participating in the
Should Mira get 75 or 80 on the project classroom, doing work, and talking to the teacher or
to other students. Sometimes information is in
written form that teachers obtain from some
Test, Measurement, Evaluation, & Assessment students [9]
Assessment: The process of gathering, evaluating, and Ungraded (formative) assessments that gather
using information [3] information from multiple sources to help
teachers make decisions [10]
A measure of the degree to which instructional
objectives have been attained [8] Evaluation: In the context of classroom assessment,
evaluation is the judgment of effectiveness of either an
A formal attempt to determine a student’s status instructional activity or the competence of the teacher
with respect to an educational variable of interest. [6]. Judgment involves qualitative interpretation of
Synonyms: measurement, test [6] student scores
Any of a variety of procedures used to obtain Appraisal/judgment is based on either qualitative or

information about student performance. Includes quantitative information gathered [4]
traditional paper-and-pencil tests as well as extended
responses (e.g., essays), performances of authentic Measurement(s): Data on student performance and
tasks (e.g., laboratory experiments), teacher learning that are collected by teachers [9]
1 Objective vs subjective
2nd Sem 2020–2021 jmmillare@usm.edu.ph Page 1 of 11

fi
fi
)
 
.
 
)
today, assessments can go well beyond paper-and-pencil

An evaluation expressed in quantitative (number) exercises to judgments based on students’
terms [10] performances, portfolios, projects, or products.
The process of obtaining a numerical description of The vast array of assessment procedures used in the
the degree to which an individual possesses a school can be classi ed and described in many different
particular characteristic. Measurement answers the ways [4]
question: “How much?” [4]
Nature of the Assessmen
Test2: An instrument or systematic procedure for 1. Maximum performance (what a person can do
measuring a sample of behavior by posing a set of 2. Typical performance (what a person will do
questions in a uniform manner. Because a test is a form
of assessment, tests also answer the question: “How well Format of Assessmen
does the individual perform—either in comparison with 1. Selected-response test (student chooses answer
others or in comparison with a domain of performance to a question from available options
tasks?” [4] 2. Complex-performance assessment (student
constructs extended response or performs in
response to a complex task
Measurement is quantitative—the description of an
event or characteristic using numbers. Measurement Use in Classroom Instructio
tells how much, how often, or how well by providing 1. Placement assessment (measures entry behavior
scores, ranks, or ratings. Instead of saying, “Sarah doesn’t 2. Formative assessment (monitors learning
seem to understand addition,” a teacher might say, “Sarah progress
answered only 2 of the 15 problems correctly in her 3. Diagnostic assessment (identi es causes of
addition homework.” learning problems
4. Summative assessment (measures end-of-course
Measurement also allows a teacher to compare one achievement
student’s performance on a particular task with either a
speci c standard or the performances of other students Method of Interpreting the Result
on the same task. 1. Norm referenced (describes student
performance in terms of the relative position help
Not all the decisions made by teachers involve in some known group
measurement. Some decisions are based on 2. Criterion referenced (describes student
information that is dif cult to express numerically: performance in terms of a clearly de ned and
student preferences, discussions with families, previous delimited domain of learning tasks
experiences, even intuition. But measurement does play
a large role in many classroom decisions, and, when Additional details are presented in the following table:
properly done, it can provide unbiased data for decision Describing classroom assessment procedures [4]
making.
Basis for Function of the Illustrative
Increasingly, measurement specialists are using the term Classi catio Assessment Instruments
assessment to describe the process of gathering • Types of
information about students’ learning. Assessment is Assessment
broader than testing and measurement because it A. Nature of
includes all kinds of ways to sample and observe Assessmen
students’ skills, knowledge, and abilities (Linn & Miller, • Maximum Determines what Aptitude tests,
2005). Performanc individuals can do Achievement test
when performing at
Assessments can be formal, such as unit tests, or their bes
informal, such as observing who emerges as a leader in
group work. Determines what Attitude, interest, &
• Typical individuals will do personality
Performanc under natural inventories;
Assessments can be designed by classroom teachers or e conditions Observational
by local, state, or national agencies such as school techniques; Peer
districts, PRC (Philippine Regulatory Commission), or appraisal
the CEM (Center for Educational Measurement). And
2 Popham (2017) has the same de nition for test, measurement, and assessment. For him, these terms are interchangeable. When we talk about the process
of assessment, however, we adopt the description of [4] — test is the instrument, measurement is the scoring, and assessment is the whole process, which
also includes judgment (evaluation) of objective attainment.

e

fi
fi

fi
fi
t
fi
.
fi

s
fi
)
Describing classroom assessment procedures [4] Describing classroom assessment procedures [4]
Basis for Function of the Illustrative Basis for Function of the Illustrative
Classi catio Assessment Instruments Classi catio Assessment Instruments
• Types of • Types of
Assessment Assessment
B. Form of D. Method of
Assessmen interpreting
• Fixed- Ef cient Standardized result Describes student Teacher-made tests,
choice tes measurement of multiple-choice tes • Criterion performance Custom-made tests
knowledge and reference according to a from test publishes,
skills; Indirect speci ed domain of Observational
indicato clearly de ned technique
learning tasks (e.g.,
Measurement of Hands-on adds single-digit
• Complex- performance in laboratory whole numbers
performanc contexts & on experiment,
e problems valued in Projects, Essays, Describes student Standardized
assessment their own right Oral presentations • Norm performance aptitude &
referenced according to achievement tests,
C. Use in relative position in Teacher-made
classroom some known group survey tests,
instructio Determines Readiness tests, (e.g., ranks 10th in a Interest inventories,
• Placemen prerequisite skills, Aptitude tests, classroom group of Adjustment
degree of mastery Pretests on course 30) inventories
of course goals, objectives, Self-
and/or best mode report inventories,
of learnin Observational Other terms used to describe classroom assessment
technique procedures [4]
Determines
• Formativ learning progress, Teacher-made tests, Informal vs Standardized Tests. Informal tests are those
provides feedback Observational constructed by classroom teachers. Those designed by
to reinforce technique test specialists and administered, scored, and interpreted
learning, & corrects under standard conditions are called standardized tests.
learning error
Individual vs Group Tests. Some tests are administered
Published diagnostic
on a one-on-one basis using careful oral questioning (e.g.,
Determines causes tests, Teacher-made
individual intelligence tests), whereas others can be
• Diagnosti (intellectual, diagnostic tests,
administered to a group of individuals.
physical, emotional, Observational
environmental) of technique Mastery vs Survey Tests. Some achievement tests
persistent learning measure the degree of mastery of a limited set of speci c
dif cultie Teacher-made learning outcomes, whereas others measure a student’s
survey tests, general level of achievement over a broad range of
Determines end-of- Performance rating outcomes. Mastery tests typically use criterion-referenced
• Summative course scales, Product interpretations, and survey tests tend to emphasize norm-
achievement for scales referenced interpretations. But, some criterion-referenced
assigning grades or interpretations are also possible with carefully prepared
certifying mastery survey tests.
of objectives
Supply vs Fixed-Response Tests. Some tests require
examinees to construct the answer (e.g., essay test),
whereas others require them to select one of two or
more xed-response options (e.g., multiple-choice test).
Speed vs Power Tests. A speed test is designed to
measure the number of items an individual can complete
in a given time. A power test is designed to measure level
of performance under ample time conditions; usually,
items are arranged in order of increasing dif culty.

fi
fi
fi
s
fi
fi
fi
r
fi
t

fi

fi
Other terms used to describe classroom assessment continues to experience failure in a speci c
procedures [4] subject despite the use of prescribed alternative
methods of instruction, then a more detailed
Objective vs Subjective Tests. An objective test is one on diagnosis is necessary.
which equally competent examinees will obtain the same
scores (e.g., multiple-choice test). A subjective test is one
If formative assessment provides rst aid
in which the scores are in uenced by the opinion or
judgment of the person doing the scoring (e.g., essay test)
treatment for simple learning problems, diagnostic
assessment searches for the underlying causes of
those problems that do not respond to rst aid
Some of these procedures are presented in greater treatment. Thus, diagnostic assessment is much
detail in the following discussions more comprehensive and detailed
Summative Assessment. Assessment after instruction is

Why Do We Assess? Placement, Formative, nished to document student learning/performance/
Diagnostic, Summative, Accountability Purposes achievement [3, 7, 10]; also called formal assessment [7]
Placement assessment: Concerned with the student’s Final evaluations of students’ achievement of an
entry performance [4]. Focuses on questions such as objective [8]
the following:
(a) Does the student possess the knowledge and skills When tests are used to make nal judgments about
needed to begin the planned instruction?; students or the quality of a teacher’s instruction [6]
(b) To what extent has the student already developed
the understanding and skills that are the goals of Designed to determine the extent to which
the planned instruction? Should the student skip instructional goals have been achieved [4]. Used
certain topics, or be placed in a more advanced primarily for assigning course grades or for certifying
course?; student mastery of the intended learning outcomes.
(c) To what extent do the student’s interests, work Also provides information for judging the
habits, and personality characteristics indicate that appropriateness of the course objectives and the
one mode of instruction might be better than effectiveness of instruction.
another (e.g., group vs independent study)?
Accountability: Practice of evaluating teachers and
Formative assessment: Ungraded testing used before or schools for the extent to which they meet measurable
during instruction to aid in planning and diagnosis [10] educational goals [9].
Assessment that occurs during and after instruction to Making teachers and schools responsible for student
provide feedback to teachers and students [3] learning, usually by monitoring learning with high-
stakes tests [10]
Used to monitor learning progress during instruction
to provide continuous feedback to both students High-stakes testing: Standardized tests whose
and teachers concerning learning successes and results have powerful in uences when used by
failures [4]. Speci c learning errors and school administrators, other of cials, or
misconceptions are identi ed for improvement of employers to make decisions [10].
learning and instruction
Using tests in a way that will have important
A planned process in which assessment-elicited consequences for the student, affecting such
evidence of students’ status is used by teachers to decisions as whether the student will be
adjust their ongoing instructional procedures or by promoted or be allowed to graduate [7].
students to adjust their current learning tactics [6]
Use of test scores as the sole or primary basis for
Diagnostic assessment. Assessment of speci c skills making decisions important to the lives of
used to identify students’ needs and to guide instruction students (and teachers) [9].
[8]
A highly specialized procedure used to determine Student evaluations serve six primary purposes [8]
the causes of persistent learning problems and to 1. Feedback to student
formulate a plan for remedial instruction [4]. 2. Feedback to teacher
3. Information to parent
These persistent or recurring learning dif culties
are those left unresolved by standard corrective
prescriptions of formative assessment. If a student

fi

.
fi
s
fl
fl
fi

fi
fi

.
fi

fi

fi
fi
fi

4. Information for selection and certi cation [e.g. feedback from formative tests can help students
college entrance exams, Licensure Examination for become more self- regulated in their learning.
Teachers (LET)
5. Information for accountability [e.g. National Summative assessment occurs at the end of instruction.
Achievement Test (NAT) Its purpose is to let the teacher and the students know
6. Incentives to increase student effor the level of accomplishment attained. Summative
assessment, therefore, provides a summary of
Information for accountability. Often, evaluations of accomplishment. The nal exam is a classic example.
students serve as data for the evaluation of teachers,
schools, districts, or even provinces and regions. The distinction between formative and summative
Nationwide testing programs, such as the National assessment is based on how the results are used. And
Achievement Test (NAT), allows for ranking of schools any kind of assessment—traditional, performance,
in terms of student performance. In addition to national project, oral, portfolio, and so on—can be used for
tests, school districts may use tests for similar purposes either formative or summative purposes.
(for example, in grade levels not tested by the NAT).
These test scores are also used in evaluations of If the purpose of the assessment is to improve your
principals, teachers, and superintendents. Consequently, teaching and help students guide their own learning,
these tests are taken very seriously then the evaluation is formative. But if the purpose is to
evaluate nal achievement (and help determine a
Incentives to increase student effort. One important course grade), the assessment is summative. In fact, the
use of evaluations is to motivate students to give their same assessment could be used as a formative
best efforts. Formative evaluations might even be made evaluation at the beginning of the unit and as a
"on the y" during instruction through oral or brief summative evaluation at the end.
written learning probes. In contrast, summative
evaluation refers to tests of student knowledge at the The formative uses of assessment are really the most
end of instructional units (such as nal exams). important in teaching. In fact, Popham believes “any
Summative evaluations may or may not be frequent, but teacher who uses tests dominantly to determine
they must be reliable and (in general) should allow for whether students get high or low grades should receive
comparisons among students. Summative evaluations a solid F in classroom assessment” (2008, p. 256). Tests
should also be closely tied to formative evaluations and and all assessments should be used to help teachers
to course objectives make better instructional decisions.
Incentive: Positive or negative stimuli or events that

can motivate a student’s behavior [7]. Using Tests to Make Instructional Decisions
The best use of assessment is to plan, guide, and target
An object or event that encourages or discourages instruction. Here are some decisions that teachers can make
behavior [10] based on assessment results.
Typical
There are two general uses or functions for assessment: Decision Decision
Assessment
formative and summative. Formative assessment Category Options
Strategy
occurs before or during instruction. The purposes of
formative assessment are to guide the teacher in What to focus in Pre-assessment Whether to
planning and improving instruction and to help students teaching? (before provide
improve learning. instruction) instruction for
speci c objectives
In other words, formative assessment helps form How long to En route Whether to
instruction and provides feedback that is non- keep teaching assessment of continue or end
evaluative, supportive, timely, and speci c. Students toward a students’ progress instruction for an
often take a formative test before instruction, a pretest particular (during objective
that helps the teacher determine what students already instructional instruction) (individual or
know. objective? whole class)
How effective Comparing Whether to
Teachers sometimes give a test during instruction to was an students’ posttest retain, discard, or
see what areas of weakness remain, so they can direct instructional to pretest modify a given
teaching toward the problem areas. These formative sequence? performances instructional
tests are not graded, so students who tend to be very sequence the
anxious about “real” tests may nd this low-pressure next time it’s
practice in test taking especially helpful. Also, the used

fi

fl
fi
.
fi
]
 
fi

fi
fi
.
fi

Classroom vs. Standardized Assessment Comparative advantages of standardized and informal

classroom tests of achievement [4]
Classroom assessment: The collection, evaluation, and
use of information for teacher decision making [3] Standardized Informal
Property
Achievement Tests Achievement Tests
Selected and created by teachers and can take many Learning Major outcomes Well adapted to
different forms—unit tests, essays, portfolios, outcomes and and content outcomes and
projects, performances, oral presentations, and so on content common to content of local
[10]. measured majority of schools. curriculum.
Test of basic skills Flexibility affords
Every type of information a teacher takes in about and complex continuous
students in the classroom—including test results, outcomes adaptation of
grades on homework and class presentations, and adaptable to many measurement to
the informal conversations teachers have with their local situations; new materials and
content-oriented changes in
students over the course of the year [9].
tests do not re ect procedure.
emphasis or Adaptable to
timeliness of local various-size work
curriculum. units. Too often
neglect complex
learning outcomes.
Quality of test General quality of Quality of items is
items items high. Written unknown unless
by specialists, test item le is
pretested, and maintained and
selected on basis of used. Quality
effectiveness. typically lower than
standardized
because of
teacher’s limited
time and lack of
opportunity to
pretest items.
Reliability Reliability high, Reliability usually
commonly between unknown; can be
Figure. Basic steps in classroom testing and assessment .80 and .95; high if carefully
[4 frequently around constructed.
.90
Administratio Procedures Uniform
Standardized assessment: Uses tests given, usually n and scoring standardized; procedures favored
nationwide (large-scale), under uniform conditions and speci c instructions but may be exible.
scored according to uniform procedures [10] provided.
Interpretation Scores can be Score comparisons
Tests are used to assess students’ performance in compared with and interpretations
different domains and allow a student’s performance those of norm limited to local
to be compared with the performance of other groups. Test manual school situation; can
students at the same age or grade level on a national and other guides be interpreted in
basis [7] aid interpretation light of known
and use. instructional history.
Norm group: Large sample of students serving as a
comparison group for scoring tests [10]
Standardized tests are administered, scored, and
interpreted in a uniform manner—same directions, time
limits, and scoring for all. They are used to meet the
growing demands for accountability.
Examinations such as the NAT, LET and college

entrance examinations are examples of standardized
tests. Unlike the teacher-made tests, a standardized test

]
fi
.
fi
fl
fl

is typically given under the same, "standardized" entrance tests. Standardized tests are also sometimes
conditions to thousands of students who are similar to used for diagnosis, such as when employed to
those for whom the test is designed. This allows the test determine presence of a learning disability or mental
developer to establish norms to which any individual retardation. They can also be used to assess
score can be compared development on several dimensions, such as cognitive,
physical, or social. Finally, standardized tests are often
Standardized tests are used for a wide range of used to evaluate student, teacher, and school
purposes at all levels of education performance. Such tests inform parents, administrators,
and governing bodies of the ef cacy of the education
• Standardized tests are typically carefully process
constructed to provide accurate information
about students' levels of performance Three kinds of standardized tests are commonly used in
school settings:
• They are often used to select students for entry 1. aptitude tests,
or placement in speci c programs 2. norm-referenced achievement tests, and
3. criterion-referenced achievement tests.
• They are also used to diagnose individual
students' learning problems or strengths
Aptitude vs. Achievement Tests
• They can also evaluate students' progress and
teachers' and schools' effectiveness Aptitude test. Designed to measure general abilities and
to predict future performance [8]
[In order to determine performance of schools and
thus ensure accountability, they need to be ranked Used to predict a student’s ability to learn a skill or
based on similar characteristics/uniform criteria. One accomplish something with further education and
way to do these is through standardized testing. training [7]
The accountability movement stems in part from the Designed to assess student potential, measuring
public's loss of con dence in education. Legislators general abilities and predicting future performance
(among others), upset by examples of students [7]. These include tests of general intelligence, like
graduating from high school unable to read or compute, the IQ test, and those that tests scholastic aptitudes,
have demanded that schools establish higher standards such as college entrance exams. Intelligence tests are
and that students achieve them. The accountability administered either to individuals or in groups.
movement has its critics, however. Many argue that
minimum competency testing focuses schools on Achievement tests: Standardized tests measuring how
minimums rather than maximums. Others are much students have learned in a given content area
concerned that schools will teach only what is tested, [10]
emphasizing reading and mathematics at the expense
of, for instance, science and social studies, and achievement tests Standardized tests measuring how
emphasizing easily measured objectives (such as much students have learned in a given context [8]
punctuation) over more important hard-to-measure
objectives (such as composition) achievement test A test that measures what the student
has learned or what skills the student has mastered [7]
What is the main difference between standardized
and non-standardized tests? Standardized tests provide Test that measures accomplishments in either single or
information to test takers about their performance multiples areas of endeavor [9]
when compared to others of their age or grade-level
who have already taken the test (the norm sample). Achievement tests, on the other hand, measure how
Non-standardized tests cannot provide the same much students have learned in a given context. Tests
information, because a norm sample is not available for such as the NAT are used to measure individual or
purposes of comparison group achievement in a variety of subject areas
How are standardized test results used in student

selection, placement, diagnosis and evaluation? Criterion- vs Norm-Referenced Interpretations
Standardized tests are generally used to compare
students to other students (e.g., locally or nationally) in Norm-referenced assessment: A test or other type of
ways that cannot be accomplished by teacher-made assessment designed to provide a measure of
tests. Sometimes standardized tests are used for performance that is interpretable in terms of an
selection or placement, such as when students college individual’s relative standing in some known group [4]

.
fi
fi
.
fi
.
compared to the scores obtained by other people who

Testing in which scores are compared with the have taken the same test. This is called a norm-
average performance of others [10] referenced comparison.
Assessment that compares the performance of one The second type is criterion-referenced. Here, the
student against the performance of others [8] score is compared to a xed standard or minimum
passing score. The same test can be interpreted either
norm-referenced tests Standardized tests in which a in a norm-referenced or criterion-referenced way.
student’s score is interpreted by comparing it with
how others (the norm group) performed [7] NORM-REFERENCED TEST INTERPRETATIONS. In
norm-referenced testing and grading, the people who
Tests that compare test takers' scores with the have taken the test provide the norms for determining
average performance of the test takers [9] the meaning of a given individual’s score. You can think
of a norm as being the typical level of performance for
norm group The group of individuals previously a particular group. By comparing the individual’s raw
tested that provides a basis for interpreting a test score (the actual number correct) to the norm, we can
score [7] determine if the score is above, below, or around the
average for that group.
norms Standards that are derived from the test
scores of a sample of people who are similar to There are at least four types of norm groups
those who will take the test and that can be used (comparison groups) in education
to interpret scores of future test takers [8] 1. the class or school itself,
2. the school district,
Criterion-referenced assessment: A test or other type 3. national samples, and
of assessment designed to provide a measure of 4. international samples.
performance that is interpretable in terms of a clearly
de ned and delimited domain of learning tasks. Also Students in national norm groups used for large-scale
known as standards based, objective referenced, content assessment programs are tested one year, and then the
referenced [4] scores for that group serve as comparisons or norms
every year for several years until the test is revised, or
Testing in which scores are compared to a set re-normed.
performance standard [10]
The norm groups are selected so that all
Assessments that rate how thoroughly students have socioeconomic status (SES) groups are included in the
mastered speci c skills or areas of knowledge [8] sample. Because students from high-SES backgrounds
tend to do better on many standardized tests, a high-
criterion-referenced tests Standardized tests in which SES school district will almost always have higher scores
the student’s performance is compared with compared to the national norm group.
established criteria [7]
Norm-referenced tests cover a wide range of general
These two types of assessments are best viewed as the objectives. They are especially appropriate when only
ends of a continuum rather than as a clear-cut the top few candidates can be admitted to a program.
dichotomy [4]. As shown in the following continuum,
the criterion-referenced test emphasizes description of However, norm-referenced measurement has its
performance, and the norm-referenced test emphasizes limitations. The results of a norm-referenced test do not
discrimination among individuals tell you whether students are ready to move on to
more advanced material. For instance, knowing that two
Criterion-Referenced
Combined Type
Norm-Referenced students are in the top 3% of the class on a test of
Assessment Assessment
algebraic concepts will not tell you if they are ready to
move on to advanced math; everyone else in the class
Description of Discrimination Among may have a limited understanding of the algebraic
Dual Interpretation
Performance Individuals concepts.
Nor are norm-referenced tests particularly appropriate

for measuring affective and psychomotor objectives. To
The answers given on any type of test have no meaning measure individuals’ psychomotor learning, you need a
by themselves; we must make some kind of comparison clear description of standards. (Even the best gymnast in
in order to interpret test results. There are two basic school performs certain exercises better than others
types of comparisons: In the rst, a test score is and needs speci c guidance about how to improve.)

fi
.
fi
fi

fi

fi

.
justify one particular standard over another. Finally, at

In the affective area, attitudes and values are personal; times, it is valuable to know how the students in your
comparisons among individuals are not really class compare to other students at their grade level
appropriate. For example, how could we measure an both locally and nationally.
“average” level of political values or opinions?
You can see that each type of test is well suited for
Finally, norm-referenced tests tend to encourage certain situations, but each also has its limitations.
competition and comparison of scores. Some students
compete to be the best. Others, realizing that being the Comparison of Norm-Referenced Tests (NRTs) and
best is impossible, may compete to be the worst. Both Criterion-Referenced Tests (CRTs) [4
goals have their casualties. Common Characteristics of NRTs and CRTs:
CRITERION-REFERENCED TEST INTERPRETATIONS. 1. Both require speci cation of the achievement domain
When test scores are compared, not to the scores of to be measured
others, but to a given criterion or standard of 2. Both require a relevant and representative sample of
performance, this is criterion- referenced testing or test items
3. Both use the same types of test items
grading. To decide who should be allowed to drive a car,
4. Both use the same rules for item writing (except for
it is important to determine just what standard of item dif culty)
performance works for selecting safe drivers. It does 5. Both are judged by the same qualities of goodness
not matter how your test results compare to the results (validity and reliability)
of others. If your performance on the test was in the 6. Both are useful in educational assessment.
top 10%, but you consistently ran through red lights,
you would not be a good candidate for receiving a Differences between NRTs and CRTs
license, even though your score was high. NRT CRT
Criterion-referenced tests measure the mastery of very 1. Typically covers a large 1. Typically focuses on a
speci c objectives. The results of a criterion-referenced domain of learning tasks, delimited domain of
test should tell the teacher exactly what the students with just a few items learning tasks, with a
measuring each speci c relatively large number of
can and cannot do, at least under certain conditions.
task items measuring each
For example, a criterion-referenced test would be speci c task
useful in measuring the students’ ability to add three-
digit numbers. A test could be designed with 20 2. Emphasizes discrimination 2. Emphasizes description of
different problems, and the standard for mastery could among individuals in what learning tasks
be set at 17 correct out of 20. (The standard is often terms of relative level of individuals can and
somewhat arbitrary and may be based on such things learning cannot perform
as the teacher’s experience.) If two students receive
scores of 7 and 11, it does not matter that one student 3. Matches item dif culty to
did better than the other because neither met the 3. Favors items of average learning tasks, without
standard of 17. Both need more help with addition. dif culty and typically altering item dif culty or
omits very easy and very omitting easy or hard
When teaching basic skills, comparison to a preset hard items items
standard is often more important than comparison to
4. Interpretation requires a 4. Interpretation requires a
the performance of others. It is not very comforting to clearly de ned group. clearly de ned and
know, as a parent, that your child is better in reading delimited achievement
than most of the students in her class if none of the domain.
students is reading at grade level. Sometimes standards
for meeting the criterion must be set at 100% correct.
You would not like to have your appendix removed by
a surgeon who left surgical instruments inside the body
only 10% of the time.
Criterion-referenced tests are not appropriate for

every situation. Many subjects cannot be broken down
into a set of speci c objectives. And, although standards
are important in criterion-referenced testing, they can
often be arbitrary, as you have already seen. When
deciding whether a student has mastered the addition
of three-digit numbers comes down to the difference
between 16 or 17 correct answers, it seems dif cult to

fi
.
fi
.
fi
fi
.
fi
fi
.
fi
.
fi
fi
fi
fi
.

fi

Norm-referenced Criterion-
Interpretation referenced
Interpretation Traditional vs. Alternative Assessment
Formative Norm-referenced Criterion-referenced Authentic assessments: Assessments that ask students to
Assessment interpretation is interpretation is ideal
use skills and knowledge to solve problems in the same
NOT ideal in in formative testing
formative testing conditions, as
manner as if the students were completing real-world
conditions, as teachers want to tasks; Tests designed to allow students to show their
information about know which students achievements or abilities in a real-life context [9]
individual progress is are having trouble
more informative at with concepts being Authentic task: Tasks that have some connection to
this point and can be taught. real-life problems the students will face outside the
used to adjust the classroom [10]
dif culty of lessons.
Performance assessments: Any form of assessment that
Summative Summative testing is Criterion-referenced
Assessment employed at the end interpretation is still requires students to carry out an activity or produce a
of a unit being taught. ideal in summative product in order to demonstrate learning [10].
Norm-referenced testing conditions, as Assessment that requires creating answers or products
interpretation allows it provides feedback that demonstrate knowledge and skill; examples include
teachers to compare about which students writing an essay, conducting an experiment, carrying out
the performance of mastered or failed to a project, solving a real-world problem, and creating a
individuals against master unit concepts. portfolio [7]
class norms.
Portfolio: A systematic and organized collection of a
student’s work that demonstrates the student’s skills
Norm-referenced achievement tests are designed to and accomplishments [7]
assess achievement of individuals against a normative
sample. These tests allow parents, teachers, school Portfolio assessment: Assessment of a collection of a
administrators, and colleges and universities to assess student’s work to show growth, self-re ection, and
individual percentile rankings with respect to average achievement [8]
test takers. Entrance exams are used by colleges as one
of a variety of sources to determine college admission. Scoring rubrics: Rules that are used to determine the
Alone, it does not provide enough information to quality of a student’s performance [10]. A guide that
predict college performance accurately, as many other lists speci c criteria for grading and scoring academic
factors (e.g., motivation) play an important role in papers, projects, or tests [7]
college performance. However, when used along with
other sources, such as interviews and GPA, entrance
tests may provide useful information to admissions After much criticism of standardized testing, critics have
of cers developed and implemented alternative assessment
systems that are designed to avoid the problems of
Criterion-referenced achievement tests are designed to typical multiple-choice tests. The key idea behind the
test individual performance against a criterion. For testing alternatives is that students should be asked to
instance, driver tests assess individual ability to drive a document their learning or demonstrate that they can
car in traf c safely. The appropriate use of criterion- actually do something real with the information and
referenced achievement tests include assessment of skills they have learned in school
individual ability for diagnostic purposes and the
correction of de cits with reference to the criterion. One popular form of alternative assessment is called
For example, when a student fails a driving test, the portfolio assessment: the collection and evaluation of
failure does not mean the student cannot ever drive. samples of student work over an extended period.
Instead, the failure indicates more practice is necessary Teachers may collect student compositions, projects,
and other evidence of higher-order functioning and use
What is a minimum competency test, and how does it this evidence to evaluate student progress over time.
hold teachers and schools accountable for what Portfolio assessment has important uses when teachers
students learn? Minimum competency tests are want to evaluate students for reports to parents or
criterion-referenced tests that focus on important skills other within-school purposes.
students are expected to master for promotion or
graduation. Schools are held accountable because they When combined with a consistent and public rubric,
must prepare students to meet selected criteria, such as portfolios showing improvement over time can provide
reading and mathematics powerful evidence of change to parents and to students

fi
fi
.
fi
fi
.
fi
.
fl

themselves. A key requirement for the use of 3. Comprehensive assessment requires a variety of
performance grading is collection of work samples from procedures. No single type of instrument or
students that indicate their level of performance on a procedure can assess the vast array of learning and
developmental sequence. Collecting and evaluating development outcomes emphasized in a school
work that students are already doing in class (such as program. A complete picture of student achievement
compositions, lab reports, or projects) is called portfolio and development requires the use of many different
assessment assessment procedures
4. Proper use of assessment procedures requires an

Selected- vs. Constructed-Response Tests awareness of their limitations. The cruder the
instrument, the greater its limitations and,
Constructed-response items: Items that require students consequently, the more caution required in its use
to write out information rather than select a response
from a menu [7] 5. Assessment is a means to an end, not an end in itself.
Assessment is best viewed as a process of obtaining
Selected-response items. Test items in which respondents information on which to base important educational
can select from one or more possible answers, such decisions
that the scorer is not required to interpret their
response [8]. Test items with an objective format in The process of assessment is likely to be most effective
which student responses can be scored quickly [7] when guided by these general principles. In sum, these
principles emphasize the importance of
Test items that can be scored correct or incorrect
1. Clearly specifying what is to be assesse
without the need for interpretation are referred to as
2. Selecting assessment procedures in terms of their
selected-response items. Multiple-choice, true-false, and
relevanc
matching items are the most common forms. Note that
3. Using a variety of assessment procedure
the correct answer appears on the test and the
4. Being aware of their limitation
student's task is to select it. There is no ambiguity about
5. Regarding assessment as a means to an end and
whether the student has or has not selected the
not an end in itsel
correct answer
Constructed-response items require the student to TEXTBOOK/REFERENCES

supply rather than to select the answer. They also
usually require some degree of judgment in scoring. The 1. Anderson, L.W. & Krathwohl, D.R. (Eds.). (2001). A taxonomy
simplest form is ll-in-the-blank items, which can often for learning, teaching, and assessing: A revision of Bloom's
taxonomy of educational objectives. New York, NY: Longman
be written to reduce or eliminate ambiguity in scoring.
2. De Guzman, E.S., & Adamos, J.L. (2015). Assessment of learning
Additionally, there are short essay questions and long 1. Quezon City: Adriana Publishing Co., Inc
essay items
3. McMillan, J.H. (2018). Classroom assessment: Principles and
practice that enhance student learning and motivation. New York,
NY: Pearson Education, Inc
General Principles of Assessment [4] 4. Miller, M.D., Linn, R.L., & Gronlund, N.E. (2009). Measurement
and assessment in teaching. New Jersey: Pearson Education, Inc
1. Clearly specifying what is to be assessed has priority in 5. Navarro, R.L., Santos, R.G., & Corpuz, B.B. (2019). Assessment
the assessment process. Specify the intended learning of learning 1. Quezon City: Lorimar Publishing, Inc
goals before selecting the assessment procedures to 6. Popham, W.J. (2017). Classroom assessment: What teachers need
to know. USA: Pearson Education, Inc.
use
7. Santrock, J.W. (2018). Educational psychology. New York, NY:
McGraw-Hill Education.
2. An assessment procedure should be selected because
8. Slavin, R.E. (2018). Educational psychology: Theory and Practice.
of its relevance to the characteristics or performance to New York, NY: Pearson Education, Inc
be measured. Criteria such as objectivity, accuracy, or 9. Sternberg, R.J. & Williams, W.M. (2009). Educational psychology.
convenience are important, but they are secondary Upper Saddle River, NJ: Pearson/Merrill
to the main criterion: Is this procedure the most 10. Woolfolk, A. (2016). Educational psychology. Harlow, England:
effective method for measuring the learning or Pearson Education Limited
development to be assessed? Every assessment
procedure is appropriate for certain uses and INSTRUCTIONAL MATERIAL PREPARED BY:
inappropriate for others. We need a close match JEAN M. MILLARE
between the intended learning goals and the types jmmillare@usm.edu.ph
of assessment tasks used Secondary Education Departmen
College of Educatio
University of Southern Mindana

.
fi
.
 
.

Basic Concepts and Principles in Educational Assessment: Perform?"

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic Concepts and Principles in Educational Assessment: Perform?"

Uploaded by

Copyright:

Available Formats

ProfEd 221—Assessment in Learning 1

Basic Concepts and Principles in observations, and student self-report. Assessment

COURSE OUTCOME (CO From an instructional standpoint, assessment may

3. Characterize testing, measurement, evaluation,

4. Differentiate standardized and classroom

Assessment refers to all the means used in schools to

Any of a variety of procedures used to obtain Appraisal/judgment is based on either qualitative or

2nd Sem 2020–2021 jmmillare@usm.edu.ph Page 1 of 11

ProfEd 221—Assessment in Learning 1

today, assessments can go well beyond paper-and-pencil

2nd Sem 2020–2021 jmmillare@usm.edu.ph Page 2 of 11

ProfEd 221—Assessment in Learning 1

2nd Sem 2020–2021 jmmillare@usm.edu.ph Page 3 of 11

Summative Assessment. Assessment after instruction is

2nd Sem 2020–2021 jmmillare@usm.edu.ph Page 4 of 11

ProfEd 221—Assessment in Learning 1

Incentive: Positive or negative stimuli or events that

2nd Sem 2020–2021 jmmillare@usm.edu.ph Page 5 of 11

ProfEd 221—Assessment in Learning 1

Classroom vs. Standardized Assessment Comparative advantages of standardized and informal

Examinations such as the NAT, LET and college

2nd Sem 2020–2021 jmmillare@usm.edu.ph Page 6 of 11

ProfEd 221—Assessment in Learning 1

How are standardized test results used in student

2nd Sem 2020–2021 jmmillare@usm.edu.ph Page 7 of 11

ProfEd 221—Assessment in Learning 1

compared to the scores obtained by other people who

Nor are norm-referenced tests particularly appropriate

2nd Sem 2020–2021 jmmillare@usm.edu.ph Page 8 of 11

ProfEd 221—Assessment in Learning 1

justify one particular standard over another. Finally, at

Criterion-referenced tests are not appropriate for

2nd Sem 2020–2021 jmmillare@usm.edu.ph Page 9 of 11

ProfEd 221—Assessment in Learning 1

2nd Sem 2020–2021 jmmillare@usm.edu.ph Page 10 of 11

ProfEd 221—Assessment in Learning 1

4. Proper use of assessment procedures requires an

Constructed-response items require the student to TEXTBOOK/REFERENCES

2nd Sem 2020–2021 jmmillare@usm.edu.ph Page 11 of 11

You might also like