You are on page 1of 179

BAHIR DAR UNIVERSITY

COLLEGE OF EDUCATION AND BEHAVIORAL SCIENCES


DEPARTMENT OF PSYCHOLOGY

ASSESSMENT AND EVALUATION OF LEARNING


(PGDT 423)

Distance Course Module


(For PGDT Program Trainees)

Bahir Dar University


August, 2022
Bahir Dar
Module Title: - Assessment and Evaluation of Learning
Prepared by: - Asnakew Tagele (Dr.) & Tamiru Delelegn
Revised by: - Mulualem Alemayehu

Assessment and Evaluation of Learning

Course Code: PGDT 423

Credit hours: 3

Contact hours per week: 4 hours

Bahir Dar University


August, 2022
Bahir Dar
INTRODUCTION TO THE MODULE
Overview
This module for the course Educational Assessment and Evaluation of Learning is designed
to equip you, as prospective secondary school teacher, with the conceptual and practical skills
of assessing and evaluation of students learning. It incorporates descriptions of important
concepts that help to clarify the assessment process; elaboration of the major principles and
procedures of the assessment process; the different tools and strategies that can be used for
assessing students leaning; different mechanisms that are used to maintain the quality of
assessment tools and procedures; and the ethical standards of assessment. It will also provide
you an opportunity to design and construct useful test items for assessment and evaluation in
your respective subjects.

By the end of the module, you will be able to understand the interconnections
between/among instructional objectives, assessment and evaluation. You will also have new
perspective on how to use assessment and evaluation results. Furthermore, it seeks to equip
you with basic knowledge and skills that are important for developing tools for assessment
and evaluation of students learning and educational attainments.

Throughout this module there are different in-text questions which help you to pose your
reading for a moment and reflect on what you are studying. In addition, there are many
activities that you will come across (at least one in each section) and attempt before
proceeding from one section to the next section. Therefore, you need to seriously try to
reflect on/answer each question and activity if you are to have a deep and meaningful
understanding of the concepts under discussion and be successful learners.

Now wishing you a good and successful learning journey, you may start studying your
module right now.

i
Icons Used in this Module
Dear Learner, the following icons are used throughout this module. Critically, study the
description of their symbolic representations before using the module.

You may look at the following example to use different icons and their descriptions.

This tells you there is an introduction to the module, unit and section.
 This tells you that there are learning outcomes to the Module or Unit.

This tells you there is a question to answer or think about in the text.

This tells you to note and remember an important point.

 This tells you there is an activity for you to do.

This tells you there is a Self-Test Exercise to the unit for you to do.

This tells you there is a checklist of the main points.

This tells you that these are the answers to the activities and self-test questions.

BRIEF DESCRIPTION OF THE COURSE


This course Assessment and Evaluation of Learning (PGDT 423) is a one semester, three
credit units/hours course which is designed and made available to PGDT program
Trainees/students undertaking their first-degree programme in education. There are two
prerequisite courses that you need to take before learning this module: Secondary School
Curriculum and Instruction; and Psychological Foundations of Learning and Development.
Also, it is expected that you have studied some other courses in education.

The course will involve an overview of assessment and evaluation, educational objectives,
types of tests, test development and administration, quality of classroom tests and
performance assessments. The materials have been developed to suit distance learners by
using examples in text questions and self-test exercises. This course aims to give you an
understanding of the fundamental concepts and principles of educational assessment and
evaluation and how these could be applied in the assessment of students learning outcomes

ii
and classroom achievements. Some overall objectives have been specified so as to achieve
the aims set out above. In addition, each unit has specific objectives set out. These unit
objectives are always included at the beginning of the unit. You should try to read them
before you start working through the unit.

The overall aim of this course, Educational Assessment and Evaluation of Learning, is to
acquaint and equip you with the knowledge and skills of Assessment and Evaluation of
Learning which you will need to apply as a teacher. In this course, you will learn the basic
elements that will enable you function effectively as a teacher, especially in the area of
student’s assessment. This module, discusses about the development and uses of classroom
assessments. It is designed to equip teachers with the knowledge of educational assessment
and evaluation that enables them to make sound decisions.

You may also refer to them during your study of the unit to check on your progress and after
completing the unit. In this way, you can be sure that you have done what was required of
you by the unit. The objectives of the whole course are set out below. By meeting these
objectives, you should have achieved the aims of the course as a whole.

There are some tutorial sessions which are linked up with this course. You are advised to
attend these sessions. Details of times and venues of these tutorials will be made known to
you from your study center.

Working through this Course


To complete this course successfully, you are required to read the study units, read other
reference books in assessment and evaluation. Each unit contains self-assessment exercises.
You are also required to complete and submit the assignments for assessment purposes. At
the end of the course or semester, you are expected to sit for a final examination.

How to Evaluate and Grade the Course


There are two types of assessment in this course. The first one is the tutor marked assignment
and the second is the examination at the end of the course. The assignment will be given to
you on registration in this course. In doing the assignment, you are required to apply
information, knowledge and methods drawn from the course. You must submit the
assignment to your tutor for assessment and grading according to the deadlines given to you.
The assignments count for 30% of your total course mark. Note that the marks you obtain

iii
from these assignments will count towards the final mark you will obtain for this course. The
presentation schedule which includes the dates for the completion and submission of the
assignments and attendance to tutorials will be available to you at your study center. Further
information on the assignments will be given to you by your tutorial facilitator. At the end of
the course, you are required to write an examination of about not more than three hours’
duration. This examination will count for 70% of your total course mark.

You might find it useful to review your self-tests, tutor-marked assignment and comments on
them before the examination. The examination covers all parts of the course. The
examination questions will reflect the types of self-testing, activities and tutor-marked
assignments which you have been meeting.

How to Get the Best from This Course


Distance learning involves the teacher and the learners working apart from each other. It
means that you can read and study the self-learning materials at your own time, pace and
place and at your own convenience. The self learning material has replaced the lecturer and
will guide you and direct you the same way the lecturer will do in the class. Just as the
lecturer might give you, some exercises and assignments, the same way the study units
provide the exercises and assignments for you to do at the appropriate points. The study units
follow that same format. They start with the contents, the introduction, the objectives, the
subject matter, and summary.

Tutors and Tutorials


There are tutorial sessions provided in support of this course. You should try to attend the
tutorials. This is the only chance to have face contact with your tutor and your peers and to
ask questions which are answered instantly.

The dates, times and venue will be given to you at the center. Your tutor will mark and
comment on your assignments, keep watch on your progress and/or difficulties and provide
assistance to you during the course. You must submit your assignment before the due dates.
They will be marked and returned to you as soon as possible.

iv
Course Objectives
General Objectives of the Course
On successful completion of the course, you should be able to:
• Know the basic concepts of testing, measurement, assessment and evaluation.
• Understand the role and uses of assessment and evaluation in educational process.
• State the need for assessment and evaluation in the process of instruction.
• Appreciate the roles of instructional objectives in education process.
• Explain how to measure educational outcomes in cognitive, affective and psychomotor
domains.
• Identify the types of assessment techniques and list the activities involved in different
phases of instruction.
• Use appropriate assessment strategies and techniques to assess and evaluate learning
outcomes.
• Construct classroom assessment tests based on guidelines and principles of item
construction.
• Apply test and item analysis statistics to judge strengths and weaknesses of test items
and improve their qualities.
• Justify the need of reliability and validity concepts in test construction.
• Know the functions and value of performance assessment.

v
TABLE OF CONTENTS
INTRODUCTION TO THE MODULE ……………………………………….…………….. i
BRIEF DESCRIPTION OF THE COURSE …………………………………………………………. ii
GENERAL OBJECTIVES OF THE COURSE ……………………………………….……. v
CHAPTER ONE
ASSESSMENT: CONCEPTS, PURPOSES, AND PRINCIPLES ………………………1
1.1. Definitions of Basic Terms ……………………………………………………..……1
1.2. Purposes of Assessment in Education………………………………………….……..8
1.3. Principles of Assessment……………………….……………………………………12
1.4. Basic Assumptions of Assessment ………………………………………………….14
1.5. Assessments, Learning and the Involvement of Students …………………………...16
CHAPTER TWO
ASSESSMENT STRATEGIES, METHODS AND TOOLS ……………………………….22
2.1. Types of Assessment ……………………….…………………………………..….. 22
2.2. Phases of Assessment in Instructional Process ………………………………….… 28
2.3. Assessment Strategies ……………………………………………………………... 30
CHAPTER THREE
ROLES OF OBJECTIVES IN EDUCATION 6………………………………………..……37
3.1. Meaning of Educational Objectives………………………………………………….37
3.2. Aims, Goals and objectives…………………………………………………………..38
3.3. Benefits of Stating Instructional Objectives………..……………………………….. 39
3.4. Levels and Forms of Instructional Objectives ………………………………..…..… 42
3.5. Criteria in Stating Instructional Objectives ………………………………………….43
3.6. Taxonomy of Educational Objectives ……………………………………………….47
CHAPTER FOUR
DEVELOPING CLASSROOM ACHIEVEMENT TESTS ………………………………. 60
4.1. Planning for Classroom Test Construction …………………………………………..62
4.2. General Principles of Writing Teacher made tests ………………...………………... 67
4.3. Qualities of Good Item Writer ………………………………………………………..68
CHAPTER FIVE
WRITING SELECTED RESPONSE TEST ITEMS ……………………………………….72
5.1 True or False Item Format …………………………………………………………….75
5.1.1. Description about the True-False Item Format ……………….…………….…………………75
5.1.2 Advantages and Limitations of True-False Items …………………………………….……… 75
5.1.3. Suggestions for constructing True/False items …………………………………… 76

vi
5.2. Matching Excercises ………………………………………………………………….81
5.2.1. Description about the Matching Item Format ……………….…………….,………..…………81
5.2.2 Advantages and Limitations of Matching Items ……………………,,,………….…………… 81
5.2.3. Suggestions for constructing Matching items ………………………….………. 82
5.3. MULTIPLE CHOICE ITEM FORMATS ………………………………………….. 84
5.3.1. Description about Multiple Choice Items ……………………………………….85
5.3.2. Advantages and Limitations of Multiple Choice Items ………………………… 86
5.3.3. Suggesstions for Constructing Better Multiple-choice Questions ……………...87
5.4. Context Dependent / Interpretive/ Item Format ……………………………………..92
5.4.1. Description of Interpretative Item Format …………………………………..92
5.4.2. Advantages and Limitations of Interpretive Items…………………………..94
5.4.3. Suggesstions for Constructing Better Interpretive Items ……………………96
CHAPTER SIX
CONSTRUCTED RESPONSE ITEM FORMAT………………………………………….104
6.1. Short Answer Item Formats ……………………………………………………...106
6.1.1 Descriptions of Short Answer Items ……………………………………………106
6.1.2. Advantages and Limitations of Short Answer Items …………………………..106
6.1.3. Suggesstions for Constructing Short Answer Items…………………………….107
6..2 Essay Item Format………………………………………………………………...109
6.2.1 Descriptions of the Essay Item Format …………………………………………109
6.2.2. Advantages and Limitations of Essay Items …………………………………...111
6.2.3. Suggestions for Constructing Essay Items……………………………………...112
6.2.4. Scoring Essay Items ……………………………………………………………115
CHAPTER SEVEN
ASSESSMEBLING, ADMINISTERING, SCORING AND ANALYSING
CLASSROOMTESTS………………………………………………………………………121
7.1. Assessmbling Test Items ……………………………….………………………. 121
7.1.1 Recording Test Items …………………………………………………………...122
7.1.2. Reviewing Test Items ………………………………………………………….122
7.1.3. Arranging Items in the Test ……………………………………………………122
7.1.4. Preparing Directions for the Test ……………………………………………...123
7..2 Administering Tests ……………………………………………………………...126
7..3 Scoring the Answers ……………………………………………………………...128
7..4 Item Analysis ……………………………………………………………………..128

vii
CHAPTER EIGHT
DESIRABLE QUALITIES OF GOOD TESTS ………………………………………..….136
8.1. Validity …………………………..……………………………………………….136
.1.1 Types of Validity …………………………………………….…………………. 137
8.1.2.Factors Affecting Validity ………………………………………………………140
8.2. Reliability ………………………………………………………………………...141
8.2.1.Theoretical Representations of the Concept of Reliability ……………………..142
8.2.2 Methods of Estimating Reliability ………………………………………………144
8.2.3 Factors Influencing Reliability ………………………………………………….148
CHAPTER NINE
ASSESSING PERFORMANCE AND PRODUCTS ……………………………………...153
9.1.Performance Assessment Defined ………………………………………………….. 153
9.2. Qualities of Performance Assessment ……………………………………………...155
9.3. Designing Performance Assessments ………………………………………………158
9.4. Scoring, Interpreting and Reporting Results of Performance ……………………...161
REFERENCES

viii
UNIT ONE
ASSESSMENT: CONCEPTS, PURPOSES, AND PRINCIPLES

INTRODUCTION
Dear learner, Welcome to the first unit of “Assessment and Evaluation of Learning” course
module. This is an introductory unit that is intended to familiarize you with some basic
concepts, purposes and principles of assessment. In this unit, you will be introduced with the
basic terminologies that you will encounter while studying this course. Specifically, the
concepts test, measurement, assessment and evaluation will be elaborated. Following this, the
purposes of assessment in education are described. This unit also presents you with the
important principles that have to be adhered when assessing students’ learning. Finally,
different issues related to assessment, learning and student involvement will be discussed.

Unit Objectives
Up on completion of this unit, you should be able to:
• Know the meaning of test, measurement, assessment and evaluation.
• Distinguish the difference between testing and assessment.
• Compare the difference between measurement and evaluation.
• Explain the relationship among test, measurement, assessment and evaluation.
• Examine the purposes of assessment in education.
• Describe the principles and assumptions of assessment.
• Apply the concepts and principles of assessment in the classroom context.

1.1 Definition of Basic Terms


Dear learner, before you start studying educational assessment and evaluation, you need to
have a clear understanding about certain related concepts. This section will introduce you
with basic terms that are common in assessment and evaluation of learning. The concepts of
test, measurement, assessment and evaluation are defined and clarified so that their
similarities and differences are addressed when it comes to their use in educational settings.
Therefore, in this learning section, you are introduced to the concepts by highlighting their
definitions, differences and similarities. Also, the unit takes you through the interrelationships
among the concepts.

1
As a teacher, how do you define and understand the meaning of test, measurement,
assessment and evaluation? Are these terms the same or different for you? If different, what
is the difference among these terms? Please, try from your own experiences.

Dear learner, before you proceed to the next section, try to give your own definitions for each
term. You might have found it difficult to come up with a clear distinction in meaning
between these concepts. This is because of the fact that they are concepts which may be
involved in a single process. There is also some confusion and differences in usage of these
concepts as manifested in the literature. Note that test, measurement, assessment and
evaluation are distinct but closely related terms when it comes to their use in education or
instructional settings. Therefore, it is important for you to realize their differences and
similarities so as to use them appropriately and hence avoid misconceptions. Now, let us see
the meaning of these concepts as used in this module.

1. What is a Test?
Dear learner, perhaps test is a concept that you are more familiar with than the other
concepts. You have been taking tests ever since you have started schooling to determine your
academic performance. Tests are also used in work places to select individuals for a certain
job vacancy.

In educational context, a test is a formal, systematic usually paper - and - pencil procedure to
gather information about pupil’s behavior or performance (Airasian, 1996). It is a measuring
tool or instrument typically used to provide information regarding individual’s ability,
knowledge, performance and achievement. It most commonly refers to a set of items or
questions with different type, and designed to be presented for one or more students under
specified conditions. It is used to get information regarding the extent to which the students
have mastered the subject matter taught and the attainment of instructional objectives. For
example; most of the times, when you finish a lesson or lessons in a week or a month, your
teacher may give you a test. This test is an instrument given to you by the teacher in order to
obtain data on which you are judged.

As a secondary school teacher, why you give a test for your students? What is the
purpose of testing? Explain using your own words.

2
2. What is Measurement?
Dear learner, notice that in our day-to-day life there are different things that we measure. We
measure our height and put it in terms of meters and centimeters. We measure some of our
daily consumptions like sugar in kilograms and liquid in liters. We measure temperature and
express it in terms of degree Celsius. How do we measure these things? Well, definitely we
need to have appropriate instruments such as a meter, a weighing scale, or a thermometer in
order to have accurate measurements. As educators, we frequently measure human attributes
such as academic achievement, aptitudes, interests, personality and so forth.

List the instruments or tools which you can use to measure achievement in Mathematics,
mental abilities like intelligence, performance of students in technical drawing, attitude of
workers towards delay in the payment of salary.

According to Gronlund (1985), Measurement is the process of obtaining numerical


description of the degree to which an individual possesses particular characteristics. It is a
systematic process of obtaining the quantified degree to which a trait or an attribute is present
in an individual or object. In other words, it is a systematic assignment of numerical values or
figures to a trait or an attribute in a person or object.

In educational context, the numerical value of scholastics ability, aptitude, achievement etc
can be measured and obtained using instruments such as paper and pencil tests. It means that
the values of the attribute are translated into numbers by measurement. For example, a
student receives 9 out of 10 in a spelling test, completes all six steps in the science
experiment, obtained 72 percent for a mathematics examination. Here, measurement permits
more objective description concerning attributes or traits and facilitates comparisons. Hence,
to measure, we have to use certain instruments so that we can conclude that a certain student
is better in a certain subject than another student.

Dear learner, note that measurement is a quantification or assignment of number for a


quantity with a certain rule. In other words, we assign numbers to any behavior,
characteristics, property or attribute based on agreed upon rules. For instance, if a person tells
you the size of a table, the measure is 45, what do you understand by this number? What
could be your next question to the person? If a student said I score 25, what further
information would you need? In both case you need the units. In the first instance, 45 what?
Centimeters, meters, millimeters, inches… What is the unit? In the latter case, 25 out of what

3
is maximum score? Out of 25, 50, 100? Thus, assigning number to the length of the table
(i.e., the attribute of the table) and to the behavior (i.e., performance) of the student is not
enough. We need a rule (for example, the units).

Measurement is simply a quantitative description of pupils’ behavior. For instance, Ali scored
70 out of 100 in the course assessment and evaluation of learning. Here, the number 70 is
assigned for Ali’s performance in the stated course. The quantification of the pupil’s behavior
doesn’t imply judgments regarding the value or worth of the obtained results. Thus, the
purpose of educational measurement is to represent how much of ‘something’ is possessed by
a person using numbers. Note that we are only collecting information. We are not evaluating.
Evaluation is therefore quite different from measurement. Measurement is not also the same
as testing. While a test is an instrument to collect information about students’ behaviors,
measurement is the assignment of quantitative value to the results of a test or other
assessment techniques. Measurement can refer to both the score obtained as well as the
process itself.

3. What is Assessment?
Dear learner, Now, let us move on to the next concept which is assessment. As a teacher, you
will be inevitably involved in assessing learners; therefore, you should have a clear
knowledge and the meaning of assessment.

As a teacher of your school, how would you define the term assessment to a new
teacher? What is (are) the major difference(s) between testing and assessment?

Dear learner, there are many definitions and explanations of assessment in education. Let us
look at few of them.

In teaching and learning contexts, assessment is a process of collecting, synthesizing, and


interpreting information to aid in decision-making (Nitko, 1996; and Airassin, 1996). It is a
process of gathering information to make decision about learners based on what they know
and can demonstrate as a result of instruction. It refers to gathering information about pupils,
and synthesizes this information to help teachers understand their pupils, plan and monitor
instruction, and establish a conducive classroom atmosphere. Assessment focuses not only
on the nature of the learner, but also on what is to be learned and how (Payne, 1997).

4
Assessment is a general term that includes all the different ways teachers gather information
in their classroom. Assessment is concerned with the totality of the educational setting and is
an inclusive term, that is, it includes measurement and others that cannot be directly
measured. Implicit in this definition is the idea that a variety of sources of information, both
formal and informal, may be used to arrive at a decision. It involves much more than testing,
scoring, and grading paper-pencil tests.

How do teachers collect the information about their students’ academic progress as
well as about their own teaching? Please list the tools as exhaustibly as possible.

The term assessment method has been used to encompass all the strategies and techniques
that might be used to collect information from students about their progress toward attaining
the knowledge, skills, attitudes, or behaviors to be learned. These strategies and techniques
include, but are not limited to observations, text and curriculum-embedded questions, oral
questions, paper-and-pencil tests, homework, laboratory work, research paper, project works,
field reports, interviews, peer and self-assessments, writing samples, exhibitions, portfolio
assessment, project and product assessments and the like.

Why should assessment be integrated with teaching and learning process? If there is no
assessment, what problem (s) will happen? Explain.

4. What is Evaluation?
In educational literature, the concepts ‘assessment’ and ‘evaluation’ have been used with
some confusion. Some educators have used them interchangeably to mean the same thing.
Others have used them as two different concepts. Even when they are used differently there is
too much overlap in the interpretations of the two concepts. Dear learner, in this section, you
will learn about educational evaluation with reference to its meaning and purpose.

How you define evaluation? Is evaluation really important? Discuss your personal
point of view with examples.

Evaluation is the process of making judgment about student’s performance, instruction, or


classroom climate based on the data we gathered through various assessment techniques. It
occurs after assessment. It depends on the information that has been collected, synthesized
and thoroughly interpreted. It is the point where the teacher is in a position to make informed

5
judgments or decisions about students’ learning progress and the effectiveness of teaching
(Airasian, 1996).

. What types of decisions might teachers make based on the information they collect
about the learning and teaching process in general and students learning in particular?
Please, discuss on this question with your colleague.

From instructional standpoint, evaluation may be defined as a systematic process of


determining the extent to which instructional objectives are achieved by the learners. There
are two important aspects of this definition. First, evaluation implies a systematic process,
which omits causal, uncontrolled observation of students. Second, evaluation always assumes
that instructional objectives have been previously identified. Without previously determined
objectives, it is almost impossible to judge the nature and extent of students’ learning
progress.

What is the difference between measurement and evaluation? Explain using examples.
Dear learner, till now you have got a clear idea of the terms measurement and evaluation,
their meaning, nature and purpose. You will also able to realize the difference between
measurement and evaluation. If you can answer to the following questions, you can easily
know about the relation between measurement and evaluation. To make it clear, let us take
an example;
You ask the shopkeeper to give you 2 meters of cloth. So, what the shopkeeper will do?
He will measure the cloth with the meter scale and cut 2 meters for you. This is called as
measurement. Then you will go to the cash counter to pay for it. How will you pay for it?
The cashier will value the cloth as per the rate of one meter cloth. If the cost of one meter
cloth is 50 Birr, then your cloth will cost 100 Birr. If the cost of one meter cloth is 100
Birr, it will cost 200 Birr. So, the value of your cloth depends on the cost of the cloth. This
valuing of the cloth is known as evaluation. This value depends on the quality. So, when
we judge the value, quality and worth of something, we evaluate it. When we measure the
quantity in terms of its height, weight, length amount etc., we call it measurement.

When we analyse the meaning of both the terms, we understand the relation and difference
between these two terms, From the meaning, it is clear that evaluation is much more
comprehensive and inclusive term than measurement, which is limited to quantitative
descriptions of pupils, that is, the results of measurement are always expressed in numbers.
For example, ‘Hilina’ correctly solved 30 of the 40 arithmetic problems. It does not include
qualitative descriptions of Hilina’s work. For example, it does not describe that her work was

6
neat or her score is better than before or her position in the group. Evaluation on the other
hand may include either quantitative or qualitative descriptions or both. In addition,
evaluation always includes value judgments concerning the desirability of the result. In the
above example, evaluation describes that Hilinia is making good progress in arithmetic.

The quantitative values that we obtain through measurement will not have any meaning until
they are evaluated against some standards. Educators are constantly evaluating students and it
is usually done in comparison with some standard. So, we can describe evaluation as the
comparison of what is measured against some defined criteria and to determine whether it has
been achieved, whether it is appropriate, whether it is good, whether it is reasonable, whether
it is valid and so forth. Evaluation accurately summarizes and communicates to parents, other
teachers, employers, institutions of further education, and students themselves what students
know and can do with respect to the overall curriculum expectations.

Therefore, evaluation includes both quantitative and qualitative descriptions of student’s


behavior plus value judgment concerning the desirability of that behavior. Thus, evaluation
may or may not be based on measurement (or tests) but when it is, it goes beyond the simple
quantitative description of students’ behavior. The following simple mathematical
arrangement shows the comprehensive nature of evaluation and the role of measurement in
the evaluation process.
Evaluation = Quantitative description of students’ behavior (measurement) and/or qualitative
description of students’ behavior (non-measurement) plus value judgment

 Activity
1. Define the meaning of test, measurement, assessment, and evaluation using your own
terms.
2. Explain the difference between measurement and evaluation, assessment and testing.
3. What do you think would be the consequence, if you will not assess and evaluate the
work of the students?
4. Critically elaborate this statement “Evaluation includes both quantitative and/or
qualitative description plus value judgment”.
5. What challenges you faced in assessing and evaluating students learning in your school?

7
1.2 Purposes of Assessment in Education
One of the first things to consider when planning for assessment is its purpose. Dear learner,
would you mention some of the functions of the tests you took and/or use from your lower
grades up to now. Now, as a teacher, you need to have a clear idea as to what the purposes
assessment serves. So, let’s discuss on the following questions:

Why do we need assessment in education? What do you think are the purposes of
assessment in education? Please, before you proceed to the next section, try to list some
purposes assessment can serve. Discuss this question with the members of your study group.

Assessment must be planned with its purpose in mind. The most important part of assessment
is the interpretation and use of the information that is gleaned for its intended purpose.
Teachers have many purposes for assessment of students because they are required to make
abroad range of decisions in their classrooms. Some of these decisions were about the
scholastic characteristics of pupils, while others were about their personal and social
characteristics. Some decisions were about instructional progress, and others were about
institutional adjustment and behavior.

According to Airasian (1994) teachers assess the pupil’s performance for different purposes.
Overall, assessment serves the following main purposes.

a. Instructional Functions
Airasian (1997) described that the instructional process is comprised of planning instruction
(i.e., identifying desired pupil behavior changes, selecting materials and organizing learning
experiences in to coherent, reinforcing sequence), delivering instruction to pupils; and
assessing whether pupils have mastered the desired curriculum goals. The instructional
function serves as a source for teachers to classify and refine meaningful course objectives,
provide feed-back to the teachers, motivate learners, etc. It further assists the teacher to have
an objective and comprehensive picture of each pupil’s/student’s progress. This instructional
function of assessment further serves for the following purposes:
- Decide the material to be taught to the students
- Placing students into learning sequences
- Identifying learning difficulties among pupils
- Supervising pupils’ progress
- Motivate learners

8
- Provide feedback to students, teachers, administrators and parents about pupils progress
- Evaluate the teaching learning process
- Promoting pupils from one level to another

b. Administrative Functions
This involves provision of mechanism of “quality control” for a school or school systems. It
provides a useful means for program evaluation and research, facilitating better selection,
classification, placement and certification decisions.

i. Selection decision
An institution or organization decides that some persons are acceptable while others are not
for certain jobs or vacancies; those not acceptable are rejected and no longer are the concern
of the institution, or organization. This rejection and the elimination of those rejected from
immediate institutional concern feature is central to a selection decision (Cronbach and
Gleser, 1965, cited in Nitko, 1996).

An educational institution often uses assessments to provide part of the information on which
to base selection decision. For example, college admissions are often selection decisions:
some candidates, who fulfill the selection requirements, are admitted but those who do not
fulfill the criteria are not; those not admitted are no longer the college’s concern.

ii. Placement decision


Placement decisions differ from selection is that in selection decisions rejection is possible
and the institution is not concerned about what happens to those rejected, whereas in
placement decisions persons are assigned to different levels of the same general type of
instruction, education or work, and no one is rejected (Cronbach, 1990, Cronbach and Gleser,
1965, cited in Nitko 1996). Suppose that a school places students according to their ability
level: Section A for gifted (high achieving students), Section B for average students, and
Section C for slow learners.

That is, students with low reading readiness test scores, for example, cannot be sent home.
They must be placed into appropriate educational levels and taught to read. So in placement
decision the institution or the school is responsible for all individual learners. It cannot reject
them. Placement decisions involve “vertical” grouping within a single job, program or
subject, as has been seen above. The subject taught to gifted students might be more complex
than that for average learners, and the content to be taught for slow learners could be much

9
simpler than that for gifted students. This increase in the content is an indication of increase
in vertical grouping within the same subject.

iii. Classification Decision


These are decisions that involve the assignment of persons to one of several categories, jobs,
or programs that are not necessarily thought of as levels of work instruction. Like placement,
classification decisions assume that the individual has been selected. Unlike placement
decisions, classification involves “horizontal” grouping in different curricula or jobs. For
example, legislation in the area of educating persons with disabilities has given a legal status
to many labels for classifying children with disabilities (i.e., blind, deaf, hard of hearing,
speech disorders, etc.,) into one (or more) of a few designated categories. These categories
are unordered (that is, blindness is not higher or lower than deafness).

Classifications are different from selection and placement: classification refers


to cases where the categories are essentially unordered; placement refers to cases
where the categories represent levels of education, and selection refers to the case
where students are accepted or rejected.

iv. Credentialing and Certification Decisions


Credentialing and certification decisions are concerned with assuring that a student has
attained certain standards of learning. Student certification decisions (decisions to give on not
to give certificates) may focus on whether a student has attained minimum competence or
whether a student has attained a high standard performance.

c. Guidance and Counseling Functions


Assessment also serves as a means to make counseling and guidance decisions. Results
obtained during assessment can also be used in assisting pupils with educational and
vocational decisions, guiding them in the selection of curricular and extracurricular activities.
Guidance and counseling professionals frequently use tests to help them in exploring and
choosing careers and directing them to prepare for the careers they select. Professionals
usually use multiple types of tests to obtain relevant information about their client. However,
it should be noted that a single assessment result is not used for making guidance and
counseling decisions. Rather, a series of assessments is administered, including an interview,
interest inventory, various aptitude tests, a personality questionnaire, and an achievement
battery. Information from these assessments, along with additional background information,

10
is discussed with the student during a series of counseling sessions. This facilitates a
student’s decision-making processes and is an entrée (i.e., initial part) to the exploration of
different careers. Generally, helping pupils to solve personal and social adjustment problems,
all require an objective knowledge of the pupils’ abilities, interests, attitudes and other
personal characteristics.

d. Educational Diagnostic and Remedial Decisions Functions


The teacher should make continual diagnostic assessments of groups or individuals formally
or informally. A diagnostic function of assessment serves to understand exceptional people,
individual’s special aptitudes and abilities, help individual with different problems of sight,
hearing and other problems and identify the strength and weakness of students.

In carrying out the diagnostic purposes, diagnostic tests are frequently utilized. Diagnosis of
persistent learning difficulties involves much more than diagnostic testing but such tests are
useful in the total process. In diagnosing pupils’ difficulties in adding whole numbers, for
example, we would want to include additional problems containing various number
combinations and some requiring pinpointing the specific types of errors each pupil is
making. Because our focus is on the pupil’s learning difficulties, diagnostic tests must be
constructed in light of the most common sources of problem encountered by pupils’.

Before teachers and counselors can recommend remedial help, they must know in which
specific areas an individual is having difficulty. Sometimes the instruction a teacher or school
pre-arranged is not effective for an individual student: The student may need special remedial
help or a special prescription, relying on alternative methods or materials. That is, when a
student has a problem in Mathematics, a teacher may administer a test to identify his
weaknesses and to make remedial actions.

 Activity
1. Explain ways in which assessment serves for instructional purposes.
2. Compare the administrative functions of assessment with diagnostic functions.
3. How can a teacher make assessment important for his students?
4. How can students use assessment to improve their learning?
5. Can you explain how the following persons could benefit from the use of assessment?
Teachers, Students, Parents, Principals and other administrators.

11
6. What types of decisions might teachers make based on the information they collect
about the learning and teaching process in general and students learning in particular?
Please discuss on this question with your colleague.

1.3 Principles of Assessment

Why do you think it is important to define assessment principles? Explain.


Assessment principles consist of statements highlighting what are considered as critical
elements of a system designed to assess student progress. A key idea is that principles help
guide and inform practice. These principles are expressed in terms of elements for a fair
(reliable and valid) assessment system. Thus, each principle introduces an issue that must be
addressed when evaluating a student assessment system. Assessment principles guide the
collection of meaningful information that will help inform instructional decisions, promote
student engagement, and improve student learning.

Before you read this section think of your school. Is student assessment in your school
guided by principles? If yes, what are the principles? If no, think of and discuss with
colleagues about principles that should guide our student assessment.

Different educators and school systems have developed somehow different sets of assessment
principles. Miller, Linn and Grunland (2009) have identified the following general principles
of assessment.
1. Clearly specifying what is to be assessed has priority in the assessment process.
2. An assessment procedure should be selected because of its relevance to the
characteristics or performance to be measured.
3. Comprehensive assessment requires a variety of procedures.
4. Proper use of assessment procedures requires an awareness of their limitations.
5. Assessment is a means to an end, not an end in itself.
Perhaps the assessment principles developed by New South West Wales Department of
Education and Training (2008) in Australia are more inclusive than those principles listed by
other educators. Let us look at these principles and compare them with those developed by
Miller, Linn and Grunland as described above.
1. Assessment should be relevant. Assessment needs to provide information about students’
knowledge, skills and understandings of the learning outcomes specified in the syllabus.

12
2. Assessment should be appropriate. Assessment needs to provide information about the
particular kind of learning in which we are interested. This means that we need to use a
variety of assessment methods because not all methods are capable of providing
information about all kinds of learning. For example, some kinds of learning are best
assessed by observing students; some by having students complete projects or make
products and others by having students complete paper and pen tasks. Conclusions about
student achievement in an area of learning are valid only when the assessment method we
use is appropriate and measures what it is supposed to measure.
3. Assessment should be fair. Assessment needs to provide opportunities for every
student to demonstrate what they know, understand and can do. Assessment must be
based on a belief that all learners are on a path of development and that every learner is
capable of making progress. Students bring a diversity of cultural knowledge, experience,
language proficiency and background, and ability to the classroom. They should not be
advantaged or disadvantaged by such differences that are not relevant to the knowledge,
skills and understandings that the assessment is intended to address. Students have the
right to know what is assessed, how it is assessed and the worth of the assessment.
Assessment will be fair or equitable only if it is free from bias or favoritism.
4. Assessment should be accurate. Assessment needs to provide evidence that accurately
reflects an individual student’s knowledge, skills and understandings. That is, assessments
need to be reliable or dependable in that they consistently measure a student’s knowledge,
skills and understandings. Assessment also needs to be objective so that if a second person
assesses a student’s work, they will come to the same conclusion as the first person.
Assessment will be fair to all students if it is based on reliable, accurate and defensible
measures.
5. Assessment should provide useful information. The focus of assessment is to establish
where students are in their learning. This information can be used for both summative
purposes, such as the awarding of a grade, and formative purposes to feed directly into the
teaching and learning cycle.
6. Assessment should be integrated into the teaching and learning cycle. Assessment
needs to be an ongoing, integral part of the teaching and learning cycle. It must allow
teachers and students themselves to monitor learning. From the teacher perspective, it
provides the evidence to guide the next steps in teaching and learning. From the student
perspective, it provides the opportunity to reflect on and review progress, and can provide
the motivation and direction for further learning.
13
7. Assessment should draw on a wide range of evidence. Assessment needs to draw on a
wide range of evidence. A complete picture of student achievement in an area of learning
depends on evidence that is sampled from the full range of knowledge, skills and
understandings that make up the area of learning. An assessment program that consistently
addresses only some outcomes will provide incomplete feedback to the teacher and
student, and can potentially distort teaching and learning.
8. Assessment should be manageable. Assessment needs to be efficient, manageable and
convenient. It needs to be incorporated easily into usual classroom activities and it needs
to be capable of providing information that justifies the time spent.

 Activity 1.3
1. Are there any similarities between the two sets of principles discussed above? What about
their differences?
2. What is the importance of each of these principles to the teaching and learning process?
3. List possible reasons that could hinder in applying the principles and identify solutions
for each constraint to make your assessment comprehensive and effective.
4. Based on your experiences, compare and contrast the extent each of these principles were
followed at secondary and university education levels.

1.4 Basic Assumptions of Assessment

When planning to assess students, what are the assumptions that one held in mind?
What are the things that should be kept in mind when preparing assessment tools for
assessing students?

Angelo and Cross (1993) have listed seven basic assumptions of classroom assessment
which are described as follows:

1. The quality of student learning is directly, although not exclusively related to the
quality of teaching. Therefore, one of the most promising ways to improve learning is to
improve teaching. If assessment is to improve the quality of students learning, both
teachers and students must become personally invested and actively involved in the
process.
Reflection: What should be the roles of students and teachers in classroom assessment so
as it will help students’ learning?

14
2. To improve their effectiveness, teachers need first to make their goals and objectives
explicit and then to get specific, comprehendible feedback on the extent to which
they are achieving those goals and objectives. Effective assessment begins with clear
goals. Before teachers can assess how well their students are learning, they must identify
and clarify what they are trying to teach. After teachers have identified specific teaching
goals they wish to assess, they can better determine what kind of feedback to collect.
3. To improve their learning, students need to receive appropriate and focused
feedback early and often; they also need to learn how to assess their own learning.
Reflection: How do you think feedback and self-assessment will help to improve
students’ learning?
4. The type of assessment most likely to improve teaching and learning is that
conducted by teachers to answer questions they themselves have formulated in
response to issues or problems in their own teaching. To best understand their
students’ learning, teachers need specific and timely information about the particular
individuals in their classes. As a result of the different students’ needs, there is often a gap
between assessment and student learning. One goal of classroom assessment is to reduce
this gap.
Reflection: How does classroom assessment help to reduce this gap between
assessment and student learning?
5. Systematic inquiry and intellectual challenge are powerful sources of motivation,
growth, and renewal for teachers, and classroom assessment can provide such
challenge. Classroom assessment is an effort to encourage and assist those teachers who
wish to become more knowledgeable, involved, and successful.
6. Classroom assessment does not require specialized training; it can be carried out by
dedicated teachers from all disciplines. To succeed in classroom assessment, teachers
need not only a detailed knowledge of the discipline but also dedication to teaching, and
the motivation to improve.
7. By collaborating with colleagues and actively involving students in classroom
assessment efforts, teachers (and students) enhance learning and personal
satisfaction. By working together, all parties achieve results of greater value than those
they can achieve by working separately.
Reflection: Can you explain how teachers’ collaboration with colleagues can be
more effective in enhancing learning and personal satisfaction than working alone?

15
1.5 Assessment, Learning, and the Involvement of Students
There is considerable evidence that assessment is a powerful process for enhancing learning.
Black and Wiliam (1998) synthesized studies linking assessment and learning. From this they
came up with the finding that the intentional use of assessment in the classroom to promote
learning resulted in improved student achievement. Classroom assessment promotes learning
when teachers use it in the following ways:
• When they use it to become aware of the knowledge, skills, and beliefs that their
students bring to a learning task, and;
• When they use this knowledge as a starting point for new instruction, and monitor
students’ changing perceptions as instruction proceeds.

As a secondary school teacher, how do you think you will use the information you
collect through different methods of assessment to improve the teaching and learning
process?

When learning is the goal, teachers and students collaborate and use ongoing assessment and
pertinent feedback to move learning forward. When classroom assessment is frequent and
varied, teachers can learn a great deal about their students. They can gain an understanding of
students’ existing beliefs and knowledge, and can identify incomplete understandings, false
beliefs, and immature interpretations of concepts that may influence or distort learning.
Teachers can observe and probe students’ thinking over time, and can identify links between
prior knowledge and new learning.

Learning is also enhanced when students are encouraged to think about their own learning, to
review their experiences of learning and to apply what they have learned to their future
learning. Assessment provides the feedback loop for this process. When students (and
teachers) become comfortable with a continuous cycle of feedback and adjustment, students
begin to internalize the process of standing outside their own learning and considering it
against a range of criteria, not just the teacher’s judgment about quality or accuracy. When
students engage in this ongoing metacognitive experience, they are able to monitor their
learning along the way, make corrections, and develop a habit of mind for continually
reviewing and challenging what they know.

Assessment also enhances students’ learning by increasing their motivation. Motivation is


essential for students’ engagement in their learning. The higher the motivation, the more time

16
and energy a student is willing to devote to any given task. Even when a student finds the
content interesting and the activity enjoyable, learning requires sustained concentration and
effort.

How do you think assessment will help to increase students’ motivation?


According to current cognitive research, people are motivated to learn by success and
competence. When students feel ownership and have choice in their learning, they are more
likely to invest time and energy in it. Assessment can be a motivator, not through reward and
punishment, but by stimulating students’ intrinsic interest. Assessment can enhance student
motivation by:
• emphasizing progress and achievement rather than failure
• providing feedback to move learning forward
• reinforcing the idea that students have control over, and responsibility for, their own
learning
• building confidence in students so they can and need to take risks
• being relevant, and appealing to students’ imaginations
• providing the scaffolding that students need to genuinely succeed
When students learn, they make meaning for themselves, and they approach learning tasks in
different ways. They bring with them their own understanding, skills, beliefs, hopes, desires,
and intentions. It is important to consider each individual student’s learning, rather than talk
about the learning of the class. Assessment practices lead to differentiated learning when
teachers use them to gather evidence to support every student’s learning, every day in every
class. The learning needs of some students may require individualized learning plans.

One way in which we can involve our students in the assessment process is to establish the
standards or assessment criteria with them. This will help students understand what is to be
assessed. Working with students to develop assessment tools is a powerful way to help
students build an understanding of what a good product or performance looks like. It helps
students develop a clear picture of where they are going, where they are now and how they
can close the gap. This does not mean that each student creates his or her own assessment
criteria. You, as a teacher, have a strong role to play in guiding students to identify the
criteria and features of understandings you want your students to develop.

Another important aspect is to involve students in trying to apply the assessment criteria for
themselves. The evidence is that through trying to apply criteria, or mark using a model
17
answer, students gain much greater insight into what is actually being required and
subsequently their own work improves in the light of this. An additional benefit is that it may
enable the students to be provided with more learning activities on which they will receive
feedback by the teacher.

As a teacher, what competencies do you think you should have in the area of
assessment? Write down your ideas and compare it with the work of another colleague.

In the American education system, a list of seven standards for teacher competence in
educational assessment of students has been developed. These standards for teacher
competence in student assessment have been developed with the view that student assessment
is an essential part of teaching and that effective teaching cannot exist without appropriate
student assessment. The seven standards articulating teacher competence in the educational
assessment of students are described below.

1. Teachers should be skilled in choosing assessment options appropriate for instructional


decisions. They need to be well-acquainted with the kinds of information provided by a
broad range of assessment alternatives and their strengths and weaknesses. In particular,
they should be familiar with criteria for evaluating and selecting assessment methods in
light of instructional plans.
2. Teachers should be skilled in developing assessment methods appropriate for
instructional decisions. Assessment tools may be accurate and fair (valid) or invalid.
Teachers must be able to determine the quality of the assessment tools they develop.
3. Teachers should be skilled in administering, scoring, and interpreting the results of
assessment methods. It is not enough that teachers are able to select and develop good
assessment methods; they must also be able to apply them properly.
4. Teachers should be skilled in using assessment results when making decisions about
individual students, planning teaching, developing curriculum, and school improvement.

5. Teachers should be skilled in developing valid student grading procedures that use pupil
assessments. Grading students is an important part of professional practice for teachers.
6. Teachers should be skilled in communicating assessment results to students, parents,
other lay audiences, and other educators. Furthermore, teachers will sometimes be in a
position that will require them to defend their own assessment procedures and their

18
interpretations of them. At other times, teachers may need to help the public to interpret
assessment results appropriately.
7. Teachers should be skilled in recognizing unethical, illegal, and otherwise inappropriate
assessment methods and uses of assessment information. Teachers must be well-versed in
their own ethical and legal responsibilities in assessment. In addition, they should also
attempt to have the inappropriate assessment practices of others discontinued whenever
they are encountered.
 Activity 1.5
1. What should be the roles of students and teachers in classroom assessment so as it will
help students’ learning?
2. As a secondary school teacher how do you think you can involve your students in the
assessment process?
3. In what ways can students benefit if they are involved in the assessment process?
4. How do you think feedback and self-assessment will help to improve students’ learning?
5. Discuss and report on the importance and use of having standards of teacher competence
in assessment for a particular school and the whole education system in general.

Unit Summary
In this unit, the major points discussed were the following.
• Test, measurement, assessment and evaluation are concepts that are frequently used in the
area of educational assessment and evaluation, often with varying meanings and some
confusion. However, although they overlap, they vary in scope and have different
meanings.
• Assessment provides information for teachers and school principals to make different
decisions about their students. The decisions could be instructional management
decisions, selection decisions, placement decisions, counseling and guidance decisions,
classification decisions, or certification decisions.
• The general principles of assessment serve as guidelines while deterring purpose of
assessment; selecting appropriate assessment technique for the identified purpose, using a
variety of assessment technique and knowing the limitation of each assessment technique.
• Assessment should be designed in such a way that it will elicit information about
students’ progression towards the educational objectives. With regards to the learner,
assessment is aimed at providing information that will help us make decisions concerning

19
remediation, enrichment, selection, exceptionality, progress and certification. With regard
to teaching, assessment provides information about the attainment of objectives, the
effectiveness of teaching methods and learning materials.
• Assessment is an integral process of the teaching and learning process and is an important
tool for enhancing learning.
• In order to maximize the benefits students can get out of assessment, they should be
involved in the assessment process.
• There are certain assessments competencies that teachers need to possess so as to
effectively carry out their professional responsibilities.

SELF-TEST EXERCISES
1. In order to check your understanding of what you have learned in this unit, answer the
following questions and compare your answers with what is discussed in the material. If
you couldn’t answer any one of these questions adequately, you need to go back and
study the material once again where there is a problem of understanding.
a) Define the concepts test, measurement, assessment and evaluation.
b) Describe the main purposes of assessment.
c) Describe the principles of assessment.
d) Discuss the importance of assessment for learning and teaching.
e) Identify the strategies that can be used to involve students in the assessment of
their learning.
f) Mention the major assessment competencies that professional teachers are
expected to possess.
2. Critically examine the following questions and give adequate explanation for your
answer.
a) Establish interrelationship among the terms assessment, measurement, test and
evaluation.
b) Would you like to follow continuous and comprehensive evaluation in your school? If
so, why? Discuss the importance of continuous and comprehensive evaluation.
c) Assessment and evaluation are an integral part of the educational process. Justify the
statement.

20
d) How important is assessment in the determination of students’ academic success?
How frequently should it be applied in your class to determine students’ learning
progress?
e) How do you think one make decisions about students’ learning progress? Do you
think that the type of decisions you make would affect your students’ learning? How?
f) In small groups discuss the extent to which each of the purposes of assessment have
been served by the different assessment activities you have gone through while you
were at your respective universities and report the results of your discussions.
g) Find Ethiopia’ standards of teacher competence in assessment from your nearby
educational organization (woreda, zone or regional education office) and compare
competencies with that of American standards and report the similarities and
differences.

21
UNIT TWO
ASSESSMENT STRATEGIES, METHODS AND TOOLS

INTRODUCTION
In the previous unit, you have been introduced with the major concepts of educational
assessment and evaluation. You also learned about the purposes and principles of assessment.
In this unit, you will learn about various assessment strategies that can be used in the context
of secondary education. You will also learn about methods and tools of assessment.

Unit Objectives
At the end of this unit, you should be able to:
• Identify relevant assessment strategies, methods and tools to select appropriate ones.
• Describe the informal assessment and formal assessment types used in the classrooms.
• Explain the major differences and similarities between formative assessment and
summative assessment?
• Compare criterion referenced assessment and norm referenced assessment types.
• Describe the purpose of assessment used prior, during and after instruction.
• Develop different types of assessments tools as per the principles of assessment.
• Evaluate assessment tools by identifying relevant criteria and what they measure.

2.1 Types of Assessment

As a secondary school teacher, what type(s) of assessment you practice in your school?
How do you think you will use the information you collect through different methods of
assessment to improve the teaching and learning process?

There are different approaches in conducting assessment in the classroom. Here we are going
to see three pairs of assessment typologies: namely, formal versus informal, criterion
referenced versus norm referenced, formative versus summative assessments.

1. Formal and Informal Assessment

Assessment can be either formal or informal. Both types are essential to effective instruction.
"Formal" and "informal" are not technical psychometric terms; therefore, there are no
uniformly accepted definitions. "Informal" is used here to indicate techniques that can easily

22
be incorporated into classroom routines and learning activities. Let us try to understand their
differences from the following definitions.

Formal Assessment: It consists of the standardized tests and on demand assessments


administered to all students in specific grades as part of the statewide and local district
assessment programs. These examinations measure the mastery of critical skills and concepts
at key developmental milestones. Individual progress is shown, but students are also
compared to others in their grade. A formal assessment is given a numerical score or grade
based on student performance. We will deal more on formal assessment strategies,
particularly on tests in a later section.

Informal Assessment: "Informal" is used here to indicate techniques that can easily be
incorporated into classroom routines and learning activities. Informal assessment techniques
can be used at any time without interfering with instructional time. Their results are
indicative of the student's performance on the skill or subject of interest.

Can you think of the informal assessment strategies that you can use in your classes?
What informal assessment strategies have your teachers used when you were a student?

Informal assessment consists of the evidence teachers collect in class on a continuous basis to
measure the progress of their students in mastering the skills and contents taught. It provides
continuous feedback to students, teachers, and parents. Each student is compared only to
his/her own prior level of achievement. Formal tests assume a single set of expectations for
all students and come with prescribed criteria for scoring and interpretation. Informal
assessment, on the other hand, requires a clear understanding of the levels of ability the
students bring with them. Only then may assessment activities be selected that students can
attempt reasonably. Informal assessment seeks to identify the strengths and needs of
individual students without regard to grade or age norms. Informal assessments allow
teachers to track the ongoing progress of their students regularly and often. These
assessments are designed to determine whether students are learning what is being taught, for
the purpose of adjusting instruction. By using informal assessments, teachers can target
students' specific problem areas, adapt instruction, and intervene earlier rather than later.

23
2. Criterion-referenced and Norm-referenced Assessments

How the results of tests and other assessment procedures are interpreted also provides a
method of classifying these instruments. There are two ways of interpreting student
performance – criterion-referenced and norm-referenced.

a) Criterion-referenced Assessment: This type of assessment allows us to quantify the


extent students have achieved the goals of a unit of study and a course. It is carried out
against previously specified criteria and performance standards. Where a grade is
assigned, it is assigned on the basis of the standard the student has achieved on each of
the criteria. This type of assessment is most appropriate for quickly assessing what
concepts and skills students have learned from a segment of instruction. Criterion
referenced classrooms are mastery-oriented, informing all students of the expected
standard and teaching them to succeed on related outcome measures. Criterion
referenced assessments help to eliminate competition and may improve cooperation.

b) Norm-referenced Assessment: This type of assessment has as its end point the
determination of student performance based on a position within a cohort of students –
the norm group. This type of assessment is most appropriate when one wishes to make
comparisons across large numbers of students or important decisions regarding student
placement and advancement. For example, students’ results in grade 8 national exams in
our country are determined based on their relative standing in comparison to all other
students who have taken the exam. Thus, when we say that a student has scored 80
percentile, it doesn’t mean that the student has scored an average of 80% score. Rather it
is meant to be that the student’s average score stands above 79.9% of the students, and
the remaining about 20% of students have scored above that particular student. Students’
assignment of ranks is also another example of norm-referenced interpretation of
students’ performances.

To summarize, the criterion-referenced assessment emphasizes description of student’s


performance, and the norm-referenced assessment emphasizes discrimination among
individual students in terms of relative level of learning.

3. Formative and Summative Assessments


Assessment procedures can be classified according to their functional role during classroom
instruction. One such classification system follows the sequence in which assessment

24
procedures are likely to be used in the classroom. The most commonly referred to and used
categories in this regard are formative assessment and summative assessment.

Can you differentiate the concepts of formative and summative assessment? Please, try
to describe them before you proceed studying the following section.

a) Formative Assessment: Formative assessments are used to shape and guide classroom
instruction. They can include both informal and formal assessments and help us to gain a
clearer picture of where our students are and what they still need help with. They can be
given before, during, and even after instruction, as long as the goal is to improve
instruction.

Formative assessments are ongoing assessments, reviews, and observations in a


classroom. They serve a diagnostic function for both students and teachers. Students
receive feedback that they can use to adjust, improve their performance or other aspects
of their engagement in the unit such as study techniques. Teachers receive feedback on
the quality of learners’ understandings and consequently, can modify their teaching
approaches to provide enrichment or remedial activities to more effectively guide
learners. For example, if a teacher observes that some students do not grasp a concept,
he/she can design a review activity to reinforce the concept or use a different instructional
strategy to re-teach it. Teachers can conduct formative assessment at any point in a unit of
study.

Formative assessment is also known by the name ‘assessment for learning’. The basic
idea of this concept is that the basic purpose of assessment should be to enhance students
learning. There is still another name which is associated with the concept of formative
assessment, ‘continuous assessment’. Continuous assessment (as opposed to terminal
assessment) is based on the premise that if assessment is to help students’ improvement in
their learning and if a teacher is to determine the progress of students towards the
achievement of the learning goals, it has to be conducted on a continuous basis. Thus,
continuous assessment is a teaching approach as well as a process of deciding to what
extent the educational objectives are actually being realized during instruction. In schools,
continuous assessment of learning is usually carried out by teachers on the basis of
impressions gained as they observe their students at work or by various kinds of tests

25
giving periodically. Therefore, each decision is based on various types of information that
are determined through different assessment methods at different time by teachers.

In order to assess your students' understanding, there are various strategies that you
can use. Can you mention some of the strategies that you can use to assess your students
for formative purposes? Please, try to mention as many strategies as you can.

The following are some of the strategies of formative assessment you can employ in your
classrooms:
- Observations during in-class activities of students, non-verbal feedback during lecture
- Homework exercises as review for exams and class discussions
- Question and answer sessions, both formal and informal
- Conferences between the instructor and student at various points in the semester
- In-class activities where students informally present their results
- Student feedback collected by periodically answering specific question about the
instruction and their self-evaluation of performance and progress

Tests and homework can also be used formatively if teachers analyze where students are in
their learning and provide specific, focused feedback regarding performance and ways to
improve it.
b) Summative Assessment: Summative assessment typically comes at the end of a course
(or unit) of instruction. Its purpose is to determine the extent to which the instructional
objectives have been achieved. It is used primarily for assigning course grades or for
certifying pupil mastery of the intended learning outcomes. Although the main purpose of
summative assessment is grading or the certification of pupil achievement, it also
provides information for judging the appropriateness of the course objectives and the
effectiveness of the instruction. It evaluates the quality of students’ learning and assigns a
mark to that students’ work based on how effectively learners have addressed the
performance standards and criteria.
Because they are spread out and occur after instruction every few weeks, months, or once a
year, summative assessments are tools to help evaluate the effectiveness of programs, school
improvement goals, alignment of curriculum, or student placement in specific programs. Here
are some examples of summative assessments: State assessments, teacher made achievement
tests such as End-of-unit or chapter tests, End-of-term or semester exams, ratings of various

26
types of performance, and assessment of products (reports, drawings, etc.). Assessment tasks
conducted during the progress of a semester may be regarded as summative in nature if they
only contribute to the final grades of the students. The techniques used in summative
assessment are determined by the instructional goals.
Types of Summative Assessment techniques
- Examinations (major, high-stakes exams)
- Final examination (a truly summative assessment)
- Term papers (drafts submitted throughout the semester would be a formative
assessment)
- Projects (project phases submitted at various completion points could be formatively
assessed)
- Portfolios (could also be assessed during its development as a formative assessment)
A particular assessment task can be both formative and summative. For example, students
could complete unit one of their Module and complete an assessment task for which they
earned a mark that counted towards their final grade. In this sense, the task is summative.
They could also receive extensive feedback on their work. Such feedback would guide
learners to achieve higher levels of performance in subsequent tasks. In this sense, the task is
formative – because it helps students to form different approaches and strategies to improve
their performance in the future.

Formative and summative assessment should be incorporated into programmes to ensure that
the purposes of assessment are adequately addressed. In a balanced assessment system, both
summative and formative assessments are an integral part of information gathering.
Formative assessments are commonly said to be assessment for learning because educators
use the results to modify and improve teaching techniques during an instructional period,
while summative assessments are said to be assessment of learning because they evaluate
academic achievement at the conclusion of an instructional period.

Generally, formative assessment is the short-term collection and use of evidence to guide
learning. It takes place during instruction and is designed primarily to give feedback to
inform further development, whereas summative assessment sums up a student’s
achievements at the end. Formative assessment is about improving and feedback, which is
therefore crucial; summative assessment is about deciding and too often emphasised instead
of real learning. Formative assessment has been described as the journey, not the outcome.

27
2.2 Phases of Assessment in the Instructional Process
A teacher’s complete understanding of assessment requires the teacher to realize that
assessment information should be used all through the teaching process. In other words, that
assessment really must be continues. In line with this, assessment expert Mc Millan (1997,
2000) believes that competent teachers frequently evaluate their students in relation to
learning goals and adapt their instruction accordingly. If we divide the teaching process into
three stages, we can describe assessment in each stage along with the main purpose for the
assessment.

With regards to the stages of the teaching and learning process, assessment can be
conducted prior to instruction, during instruction, and after instruction. What are the
purposes of assessment in each stage and what assessment methods can be used? Discuss
with your colleagues.

A. Assessment Prior to Instruction


Assessment prior to instruction provides a teacher with information about individual
differences among students as well as an understanding of the background or prior
knowledge of the class as a whole. It helps to gather information about what students already
know about the upcoming unit and what misconceptions they may have. These assessment
activities provide the basis for planning instruction.

Much of pre-instructional assessment is informal observation. In the first several Weeks of


school, we will have numerous opportunities to observe students’ characteristics and
behaviour. Be sensitive to whether a student is shy or out going, has a good or weak
vocabulary, speaks and listens effectively, is considerate of others or is egocentric, engages in
appropriate or inappropriate behaviour, and so on. Also focus on the student’s non-verbal
behaviour for cues that might reveal nervousness, boredom, frustration, or lack of
understanding. Some other additional points are:
o Understanding students’ cultural background, interests, skills, and abilities as they
apply across a range of learning domains and/or subject areas,
o Understanding students' motivations and their interests in specific class content;
o Clarifying and articulating the performance out comes expected of students, and
o Planning instruction for individuals or groups of students.

28
B. Assessment During Instruction
Assessment during instruction provides information about the overall progress of the whole
class as well as specific information about individual students. These assessment activities
provide the basis for monitoring progress during learning. It helps to gather information
about what, how well students are learning thus so far. Use this information to decide
information to guide changes that should be made in the lesson plans or to guide remediation
or re-teaching.

Assessment during instruction takes place at the same time as we make many other decisions
about what to do, says, or ask next to keep the classroom running smoothly and help students
actively learn. It also requires listening to student answers, observing other students for
indications of understanding or confusion, framing the next question, and scanning the class
for possible misbehaviour. At the same time, the teacher must be aware of the pace of the
activity, the sequence of choosing students to answer, the relevance and quality of the
answers, and the logical developments of the content. When the class is divided into small
groups, the teacher might need to monitor and regulate several different activities
simultaneously.
• Monitoring student progress toward instructional goals;
• Identifying gains and difficulties students are experiencing in learning and performing,
• Adjusting instruction,
• Giving contingent, specific, and credible praise and feedback,
• Motivating students to learn; and
• Judging the extent of student’s attainment of instructional out comes.

C. Assessment After Instruction


Assessment after instruction provides information about how well our students have mastered
the material, whether students are ready for the next unit, what grades students should be
given, what comments we should make to parents, and how we should adapt our instruction
(MC Millan, 1997, 2000). It refers to gather information about how well students have
learned the material that was taught. Use the information to assign grades or re-teach students
who have fallen behind.
It is after instruction that more formal types of assessment are often used.
• Describing the extent to which each student has attained both short – and - long –
term instructional goals;

29
• Communicating strengths and weaknesses based on assessment results to students,
and parents;
• Recording and reporting assessment results for school-level analysis, evaluation
and decision making;
• Analyzing assessment information gathered before, and during instruction to
understand each student’s progress to date and to inform future instructional
planning ;
• Evaluating the effectiveness of instruction; and
• Evaluating the effectiveness of the curriculum and materials in use.

2.3 Assessment Strategies


Assessment strategy refers to those assessment tasks (methods/approaches/activities) in
which students are engaged to ensure that all the learning objectives of a subject, a unit or a
lesson have been adequately addressed. Assessment strategies range from informal, almost
unconscious, observation to formal examinations. Although different subject areas may have
some differences on the assessment strategies they use, generally, however, there are varieties
of methods that can be used by most subjects.

When selecting assessment strategies in our subject areas, there are a number of things that
we have to consider. First and foremost, it is important that we choose the assessment
technique appropriate for the particular behavior being assessed. We have to use a strategy
that can give students an opportunity to demonstrate the kind of behavior that the learning
outcome demands. Assessment strategies should also be related to the course/subject material
and relevant to students’ lives. Therefore, we have to provide assessment strategies that relate
to students’ future work.

There are many different ways to categorize learning goals for students. Categorizing helps
us to thoroughly think through what we want students to know and be able to do. One way in
which the different learning outcomes that we want out students to develop can be
categorized is presented as follows:
• Knowledge and understanding: What facts do students know outright? What
information can they retrieve? What do they understand?

30
• Reasoning proficiency: Can students analyze, categorize, and sort into component
parts? Can they generalize and synthesize what they have learned? Can they
evaluate and justify the worth of a process or decision?
• Skills: We have certain skills that we want students to master such as reading
fluently, working productively in a group, making an oral presentation, speaking a
foreign language, or designing an experiment.
• Ability to create products: Another kind of learning target is student-created
products - tangible evidence that the student has mastered knowledge, reasoning,
and specific production skills. Examples include a research paper, a piece of
furniture, or artwork.
• Dispositions: We also frequently care about student attitudes and habits of mind,
including attitudes toward school, persistence, responsibility, flexibility, and desire
to learn.

In groups, discuss and identify the assessment strategies that you consider are best for
assessing each of these categories of learning goals and compare your work with that of
other groups.

From among the various assessment strategies that can be used by classroom teachers, some
are described below for your consideration as student teachers.

▪ Classroom presentations: A classroom presentation is an assessment strategy that


requires students to verbalize their knowledge, select and present samples of finished
work, and organize their thoughts about a topic in order to present a summary of their
learning. It may provide the basis for assessment upon completion of a student’s project or
essay. For example, students can be made to present a report after an educational visit.

What other educational activities can you imagine in your subject area where
students can present their works?

▪ Conferences: A conference is a formal or informal meeting between the teacher and a


student for the purpose of exchanging information or sharing ideas. A conference might be
held to explore the student’s thinking and suggest next steps; assess the student’s level of
understanding of a particular concept or procedure; and review, clarify, and extend what
the student has already completed.

31
What advantages do you think conference as a method of assessment will have?

▪ Exhibitions/Demonstrations: An exhibition/demonstration is a performance in a public


setting, during which a student explains and applies a process, procedure, etc., in concrete
ways to show individual achievement of specific skills and knowledge.

What type of objectives do you think this assessment strategy could serve to measure?

▪ Interviews: An interview can be used for assessment purposes in educational settings. In


such applications interview is a face-to-face conversation in which teacher and student use
inquiry to share their knowledge and understanding of a topic or problem. This form of
assessment can be used by the teacher to:
• explore the student’s thinking;
• assess the student’s level of understanding of a concept or procedure; and
• gather information, obtain clarification, determine positions, and probe for
motivations.

▪ Observation: Observation is a process of systematically viewing and recording students


while they work, for the purpose of making instruction decisions. Observation can take
place at any time and in any setting. It provides information on students' strengths and
weaknesses, learning styles, interests, and attitudes. Observations may be informal or
highly structured, and incidental or scheduled over different periods of time in different
learning contexts.

▪ Performance tasks: During a performance task, students create, produce, perform, or


present works on "real world" issues. The performance task may be used to assess a skill
or proficiency, and provides useful information on the process as well as the product.
Please mention some examples of performance tasks that students can do in your subject
area.

▪ Portfolios: A portfolio is a collection of samples of a student’s work over time. It offers a


visual demonstration of a student’s achievement, capabilities, strengths, weaknesses,
knowledge, and specific skills, over time and in a variety of contexts. For a portfolio to
serve as an effective assessment instrument, it has to be focused, selective, reflective, and
collaborative. Portfolios can be prepared for different subjects in any educational level.

32
What type of materials can be included in the portfolio of students in relation to your
subject?

▪ Questions and answers: Perhaps, this is a widely used strategy by teachers with the
intention of involving their students in the learning and teaching process. In this strategy,
the teacher poses a question and the student answers verbally, rather than in writing. This
strategy helps the teacher to determine whether students understand what is being, or has
been, presented; it also helps students to extend their thinking, generate ideas, or solve
problems. Strategies for effective question and answer assessment include:
• Apply a wait time or 'no hands-up rule' to provide students with time to think after a
question before they are called upon randomly to respond.
• Ask a variety of questions, including open-ended questions and those that require
more than a right or wrong answer.

During what time of the lesson do you think question and answer strategy will be more
useful? Why?

▪ Students’ self-assessments: Self-assessment is a process by which the student gathers


information about, and reflects on, his or her own learning. It is the student’s own
assessment of personal progress in terms of knowledge, skills, processes, or attitudes. Self-
assessment leads students to a greater awareness and understanding of themselves as
learners.

▪ Checklists, Rating Scales and Rubrics: These are tools that state specific criteria and
allow teachers and students to gather information and to make judgments about what
students know and can do in relation to the outcomes. They offer systematic ways of
collecting data about specific behaviors, knowledge and skills.

▪ Checklists: usually offer a yes/no format in relation to student demonstration of specific


criteria. They may be used to record observations of an individual, a group or a whole
class.

▪ Rating Scales allow teachers to indicate the degree or frequency of the behaviors, skills
and strategies displayed by the learner. Rating scales state the criteria and provide three or
four response selections to describe the quality or frequency of student work.

33
▪ Rubrics: use a set of criteria to evaluate a student's performance. They consist of a fixed
measurement scale and detailed description of the characteristics for each level of
performance. These descriptions focus on the quality of the product or performance and
not the quantity. Rubrics use a set of specific criteria to evaluate student performance.
They may be used to assess individuals or groups and, as with rating scales, may be
compared over time.

The purpose of checklists, rating scales and rubrics is to:


- provide tools for systematic recording of observations
- provide tools for self-assessment
- provide samples of criteria for students prior to collecting and evaluating data on their
work
- record the development of specific skills, strategies, attitudes and behaviors necessary
for demonstrating learning
- clarify students' instructional needs by presenting a record of current
accomplishments.

In what specific instances can these assessment strategies (rating scales,


checklists and rubrics) used in your area of study? Think of specific examples and
share your ideas with your colleague.

▪ One- Minute paper: During the last few minutes of the class period, you may ask
students to answer on a half-sheet of paper: "What is the most important point you learned
today?" and, "What point remains least clear to you?" The purpose is to obtain data about
students' comprehension of a particular class session. Then you can review responses and
note any useful comments. During the next class periods you can emphasize the issues
illuminated by your students' comments.

▪ Muddiest Point: This is similar to ‘One-Minute Paper’ but only asks students to describe
what they didn't understand and what they think might help. It is an important technique
that will help you to determine which key points of the lesson were missed by the students.
Here also you have to review before next class meeting and use to clarify, correct, or
elaborate.

▪ Student- generated test questions: You may allow students to write test questions and
model answers for specified topics, in a format consistent with course exams. This will

34
give students the opportunity to evaluate the course topics, reflect on what they
understand, and what good test items are. You may evaluate the questions and use the
good ones as prompts for discussion.

▪ Tests: This is the type of assessment that you are mostly familiar with. A test requires
students to respond to prompts in order to demonstrate their knowledge (orally or in
writing) or their skills (e.g., through performance). We will learn much more about tests
later in this course module.

 Activity 2.3
1. Let’s say you need to assess student achievement on each of the following learning targets.
Which assessment strategy would you choose? Please, list down your answers with their
justifications.
a) Ability to write clearly and coherently
b) Group discussion proficiency
c) Reading comprehension
d) Proficiency using specified mathematical procedures
e) Proficiency conducting investigations in science
2. Why should assessment be integrated with teaching and learning?
3. What are some techniques of assessment commonly used in your school?
4. Why it is unfair and is inaccurate to base students’ final grades on only one examination?
5. Have you ever tried to implement continuous assessment in your classroom? What are
some of the challenges you and your students encountered?
6. What are the major problems that teachers have with assessment in your school and how
you try to minimize these problems?

Unit Summary
In this unit, you were introduced to different types of assessment approaches, namely formal
versus informal, criterion referenced versus norm referenced, formative versus summative
assessments. You also learned about various assessment strategies. These include: classroom
presentations, exhibitions/demonstrations, conferences, interviews, observations,
performance tasks, portfolios, question and answer, students’ self assessment, checklists,
rating scales and rubrics, one-minute paper, muddiest point, students-generated questions and
tests.

35
Self - Check Exercises
It is the time to see your understanding of the general concepts of measurement and
evaluation. Read the following items and answer them by checking in one of the boxes under
alternatives “Yes” or “No”.
Yes No
Can you define informal and formal assessment?  
Can you compare formative and summative assessment?  
Can you compare norm and criterion referenced assessment?  
Can you define purposes of assessment used in each phase?  
Can you discuss different strategies of assessment?  
Is there any box that you marked “No” under it? If your answer is “Yes,” go back to your
text and read about it.

Self-Test Exercises
1. State the differences between formative and summative assessment, criterion referenced
and norm referenced assessment, and formal and informal assessment.
2. What conditions do we consider in selecting assessment strategies in our subject?
3. List down the major assessment strategies that you can use in your subject and classify
them as formal and informal strategies.
4. What are the major problems associated with assessing students learning? What strategies
can we use to minimize these problems?

36
UNIT THREE
ROLES OF OBJECTIVES IN EDUCATION

INTRODUCTION
In this unit, a revision will be made regarding the meaning of objectives, different forms of
objectives, benefits of stating instructional objectives for students and teachers and the
criteria to be followed when stating them. Furthermore, a substantial concern will be given
for discussing the different domains and levels of educational objectives. The purpose of this
unit is to help you learn about instructional objectives and its relation with learning outcomes
and assessment.
Unit Objectives
By the end of this chapter, you are expected to:
• Define the meaning of educational objectives.
• Explain the benefits of stating objectives in the instructional process.
• State SMART objectives following the criteria set for them.
• Differentiate the differences among objectives stated at different levels.
• Classify objectives in relation to domain (cognitive, affective, psychomotor) and level
of complexity called for by the objectives.

Dear learner! What are objectives? What role do they play in educational process? Can
you define the meaning of educational objectives from what you learned in Subject area
Methods of Teaching and Secondary School Curricula courses? In this text, an attempt has
been made to give answers for these questions.

3.1 Meaning of Educational Objectives


Objectives are expected behavioral outcomes at the end of a particular instructional process.
Educational objectives are statements that describe what a student should be able to do at the
end of a lesson. Objectives play a pivotal role in the process of education and assessment. In
educational setting, the assessment activities should be based on the objectives stated.
Teachers need to formulate objectives that guide their educational efforts, and they help them
to focus on student behavior. Objectives help a teacher to plan instruction, guide students
learning, and provide criteria for evaluating student outcomes. Objectives are logically and
closely tied to assessment since one critical role of assessment is to determine how well
students have learned the intended educational objectives.

37
There are two considerations in selecting objectives. They are relevance and feasibility.
Relevance refers to whether the objective is based on the need of the society and the learner
and feasibility (realism of objectives) refers to whether the objectives are achievable or not.
Setting unrealistic goals or objectives discourages both students and teachers.

3.2 Aims, Goals and Objectives


Mehrens and Lehmann (1984) indicated that terms such as need, goals, aims, outcomes,
objectives, instructional objectives, and behavioral or performance objectives have been used
almost synonymously by some writers though with sharply different meanings by others.

How are objectives different from goals? Could you explain the conceptual difference in
the meaning of objectives, goals and outcomes? In this section, an attempt has been made to
give answers for these questions.

In educational process, there are terms that are used frequently to refer to different activities
and attainments. They are aim, goal, and objective. Their definitions are given below.

Distinction between Goals, Objectives and Outcomes


Objectives are defined as a statement of a specific change to be brought about in a learner.
Goal is used as an umbrella term, including both general and specific purposes. Educational
goals are frequently stated in broad terms which give direction and purpose to overall
planning and educational activities. These broad statements are useful for guiding the
educational enterprise.

Example: Every student should acquire communication skills of understanding, speaking,


reading, and writing.
This is a broad statement which the teacher cannot directly measure the behavior. To avoid
this problem, the teacher can simplify educational goals using instructional objectives.
Objectives are also called instructional objectives, learning objectives, performance
objectives, educational objectives, curriculum objectives, achievement targets, and student
outcomes (Airasian, 1996).

In their view, these terms carry different meanings. A more elaborated and a bit different
explanation is given below.

38
1. Goals - are broad, general, long-range statements of educational purpose. A goal is a
statement of intent or vision that is not necessarily measurable. They are primarily used
in policy making and general programme planning.
2. Objectives (Instructional Objectives) - are brief statements that describe desired
learning results or intended outcomes of education. Objectives are more specific than a
goal but may be broad enough to contain several outcomes. They give directions to
education. They describe skills, behaviors, and attitudes that students should be able to
demonstrate after instruction.
3. Outcomes – these are achieved results. These outcomes may involve knowledge
(cognitive), skills (behavioural), or attitudes (affective) that provide evidence that
learning has occurred as a result of a specified course, program activity, or process. A set
of specific learning outcomes describes a sample of the types of performance learners
will be able to exhibit when they have achieved a general instructional objective.
Intended learning outcomes are desired targets, while achieved learning outcomes can be
identified at the end of the learning process, notably through assessment (Cedefop, 2016).
4. Behavioral (or Performance) Objectives - A statement that specifies what observable
performance the learner should be engaged in when we evaluate achievement. Behavioral
objectives require action verbs such as discuss, write, and read. Verbs such as understand
or appreciate are not considered behavioural because one cannot observe a person
"understanding" or "appreciating."
5. Need - is the discrepancy between an objective and the present level of performance.
As can be seen from the above explanations, with the exception of the term ‘’need’’, the
difference among these terms is a matter of generality and specificity like outcomes versus
goals, behavioral versus instructional objectives etc.

3.3 Benefits of Stating Instructional Objectives


Dear learner, note that instructional objectives play a key role in the teaching learning
process. Without instructional objectives teaching is comparable to a fallen leaf whose
destination is dependent on the will of the wind. Without instructional objectives, teachers
will have nothing to follow in order to achieve what it should achieve. Without them,
teaching is reduced to an endeavor with no definite goal, structure, or purpose.

Dear learner! Could you list down some potential benefits of stating objectives for
assessment purpose? What are the benefits of stating instructional objectives for the teacher

39
and the students? Discuss with your study group members and organize the major points of
your discussions.

The processes of planning and providing instruction are important activities for classroom
teachers. The instructional process comprises three basic steps. The first is planning
instruction, which includes identifying specific expectations or learning outcomes
(objectives), selecting materials to foster these expectations or outcomes, and organizing
learning experiences into a coherent, reinforcing sequence. The second step involves
delivering the planned instruction to students that is, teaching them. The third step involves
assessing how well students learn or achieve the expectations or outcomes. Notice that to
carry out the instructional process the three steps should be aligned with one another. That is,
the planned instruction should be logically related to the actual instruction and the
assessments should relate to the plans and instruction.
Objectives

Assessment and
Evaluation Instruction

It can be seen from the above figure that the three components of instructional process are
linked in an interdependent manner. This is to mean that objectives are developed depending
on feedback acquired from the assessment and evaluation process and assessment tools in
return are developed considering the objectives. The objectives are realized using instruction
and feedback regarding whether instruction was effective in supporting students achieve
expected outcomes rely on evaluation.

There is a circular relationship between objectives, the teaching learning process and
assessment and evaluation. One need to set tentative objectives and employ an educational
strategy to reach those objectives that is provide learning experiences for students. Then he
should measure the degree of attainment of objectives with the help of variety of evaluation
procedures. On the basis of the attainment of objectives the learning experiences can also be
evaluated. Like this, the learning experience and evaluation procedure can be refined till the
achievement of objectives or the objectives can be refined according to the learning outcomes
achieved. Till the realization of objectives this cycle continues.

40
For instance, an English teacher who came to help students to identify English alphabets at
grade one should not directly jump to teach those letters. This is because some students might
have identified them from education given at home or pre-school programs like kindergarten.
Due to this, it is suggested to begin from the level where students are and to this end it is
recommended to use preliminary assessment. On the bases of this, the teacher will write
different objectives for the ‘literate’ and illiterate children.

It is indicated in general manner that stating objectives have benefits both for the teacher and
the students. For the teacher, stating instructional objectives gives direction in selecting
contents, learning experiences, instructional aides, assessment items and tools etc. It also
helps in monitoring pupils learning progress, selecting or constructing appropriate assessment
procedures and in conveying instructional intent to others. Among these one is, objectives
help to emphasize major points and reduce non-essential material.

They also tell students what is expected of them. Instructional objectives make definite the
direction in which teaching leads and become the focus of instruction, not only for the
teachers, but also for the students. This will help students to identify the major
concepts/themes from the trivial or additional ones. In relation to this, they simplify note
taking and assist students in organizing and studying content materials. They guide the
students to what is expected from them and help them to study important information. Thus,
stating instructional objectives assist the students in studying more efficiently.

According to Nitiko (1983), writing specific (instructional objectives) serves the following
purposes:
1. They help teachers and/or curriculum designers make their own educational goals
explicit.
2. They communicate the intent of instruction to students, parents, other teachers, school
administrators and the public.
3. They provide the basis for teachers to analyse what they teach and to construct learning
exercises.
4. They describe the specific performance against which teachers can evaluate the success
of instruction.
5. They can help educators to focus and to clarify discussions of educational goals with
parents (and others).
6. They communicate to students the performance they are expected to learn.

41
7. They make it easier to individualize instruction.
8. They help teachers to evaluate and improve both instructional procedures and learning
targets.

Nitiko adds that writing specific objectives particularly provide the following benefits for
classroom assessment purpose. This includes,
1. The general planning for an assessment procedure is made easier by knowing the
specific outcomes you wish students to attain.
2. Selecting, designing, and crafting assessment procedures depend on your knowledge
regarding which specific outcomes should be assessed.
3. Evaluating an existing assessment procedure, you already drafted is easier when you
know the specific outcomes.
4. Properly judging the content relevance of an assessment procedure requires you to
know the specific outcomes to be assessed.

3.4 Levels and forms of Instructional objectives


Expectations or learning outcomes can range from very general to very specific. According to
Krathwohl and Payne (1971) on the bases of their degree of generality and specificity, there
are three levels of objectives: global, educational, and instructional. These differences lead to
a need for investment of different instructional time, learning activities, and range of
assessments for the achievement of these goals.

How are instructional objectives different from educational and global objectives?
Could you explain the conceptual difference in their meanings? Discuss your answer with
your study group.

• Global objectives, often called “goals,” are broad, complex student learning outcomes
that require substantial time and instruction for their accomplishment. They are very
general; encompass a large number of more specific objectives. They are set at policy
level. The following example is taken from the Ethiopian Education and Training policy
(1994).
Example:
• Cultivate the cognitive, creative, productive and appreciative potential of citizens by
appropriately relating education to environment and societal needs.

42
• Develop the physical and mental potential and the problem-solving capacity of
individuals by expanding education and in particular by providing basic education for all.
Because they are broadly inclusive, global objectives are rarely used in classroom assessment
unless they are broken down into more narrow objectives.

• Educational objectives represent a middle level of abstraction. These represent


objectives stated at curricular or course levels. For example;
The student will be able to improve skills in test construction.
The student will be able to understand the relevance of different theories of human
development for improving instructional process.
Educational objectives are more specific than global objectives. They are sufficiently narrow
to help teachers plan and focus teaching, and sufficiently broad to indicate the richness of the
objective and to suggest a range of possible student outcomes associated with the objective.
• Instructional objectives - are the least abstract and most specific type of objective.
Examples of instructional objectives include:
• The student will be able to identify the weaknesses of the following matching item.
• The student will be able to explain the difference between goals and educational
objectives.

Instructional objectives focus teaching on relatively narrow topics of learning in a content


area. These concrete objectives are used in planning daily lessons.

3.5 Criteria in Stating Instructional Objectives


There are many ways to state instructional objectives, but not all of them convey clearly what
students are to learn from instruction (Airasian 1996). Instructional objectives are stated in
terms of what we expect students to be able to do by the end of instruction.

What criteria should be followed when stating specific instructional objectives? What
are the components in stating them? Describe some important criteria and explain each.

In preparing instructional objectives it is possible to focus on different aspects of instruction.


Some teachers prefer to state the objectives in terms of what they are going to do during
instruction. For example, “Demonstrate to pupils how to use the microscope”. This statement
clearly indicates what the teaching activity is, but it is less clear concerning the intended
learning outcomes. Literally speaking, the objectives has been achieved when the

43
demonstration has been completed - whether or not the pupils have learned anything.
Therefore, a more desirable way to state objectives is in terms of what we expect pupils to be
able to do at the end of instruction. Here, in this example, after demonstrating how to use the
microscope, we might expect pupils to be able to do the following:
• Identify the parts of the microscope
• List the step to be followed in using the microscope
• Describes the precautions in adjusting the microscope
• Demonstrate skill in using the microscope
Statements such as this direct attention to the pupils and to the types of performance they are
expected to exhibit as a result of the instruction shifts our focus from teacher to the pupil and
from the learning experience to the learning outcomes. This shift in focus makes clear the
intent of our instruction and sets the stage for evaluating pupil performance we are willing to
accept evidence that the instruction has been successful. When viewing instructional
objectives in terms of learning outcomes, it is important to keep in mind that we are
concerned with the products of learning rather than the process of learning.

Dear learner, one of the most recommended ways to write classroom objectives is to use the
ABCD method of writing objectives. The acronyms ABCD stands for: A is the audience, B is
the behaviour or the action verb, C is the condition for the objective and D is the degree of
achievement or acceptable criteria. These four components of an objective can be explained
as follows.
A. Audience. Pertinent to the model, the Audience is always the learner. This is because
instructions are basically provided to bring behavioural change on students than the
teacher. Thus, instructional objectives should be written interms of student behaviour than
the teacher. The following example makes this concept clear:
Poor: At the end of this period, the teacher will be able to define measurement.
Good: At the end of this period, the student will be able to define measurement.
B. Behaviour: overt (observable) activity to be performed by the student. When stating
instructional objectives, it is important to state them using words that indicate observable
and measurable terms than unobservable mental processes/products. Just take a vigilance
for the example given below:
Poor: At the end of this period, the student will be able to know measurement.
Good: At the end of this period, the student will be able to define measurement.

44
C. Conditions: These are specifications of contexts under which students will show the
needed behavior. Some of them are: after attending a lecture, following review of a
demonstration, given a case study, after completing the assignment, given a specific
instrument etc. Look how these conditions can be reflected in a lesson level objective:
After reading the F.D.R.E constitution, the student will be able to clarify how human
nature is conceived.
D. Degree of acceptable performance. This criterion tells students how well they must
perform. This part of the objective may be omitted when there is no deviation from
standard procedures or protocols. The degree of acceptable performance or proficiency
can be indicated by requiring: percent, ratio or number of correct responses and setting
time limit.
Consider the following example.
At the end of the unit students will be able to solve at least eight out of ten quadratic
equations if given the quadratic formula and no time limit.

Can you identify the three elements of the instructional objectives in this objective?
1. What the students are expected to do? They are expected to solve. This is their
behavior.
2. Under what condition are they expected to solve the problems? The conditions
are: Given the quadratic formula and no time limit.
3. What is the level of performance they are expected to achieve? They are expected
to achieve a performance level of eight out of ten quadratic equations.

To make objectives amenable to assessment and evaluation needs, Mehrens and Lehmann
(1984) rephrased the above standards into the following rules:
1. As stated earlier, objectives should be stated in the form of expected student behavior, not
in terms of teacher activities.
Example
Poor: The teacher will describe the major events in the Ethio-Italian war.
Better: The student will recall the military event that directly led to the outbreak of war
between Ethiopia and Italia.
The first objective is not stated in terms of student behavior. It is an activity of teachers.
What are the students expected to do when the teacher describes the major events in the

45
Ethio-Italian war? We do not know what students are expected to do. Hence, the second
objective is appropriate, because it is stated in terms of student behavior.
2. Objectives should begin with an action verb. The objective list should describe the overt
behaviour expected and the content, which is a vehicle (i.e., an instructional procedure)
that will be used to bring about change. Examples of these terms are given under each
level in Bloom’s domains of instructional objectives.
Example
Poor: Student will know the world capitals
Better: Students will be able to recall the capitals of the countries in Eastern Africa.
3. Objectives should be stated in behavioral or performance terms. The terms used should
have the same meaning for student and teachers. Objectives must be stated in specific
terms, if we are to evaluating them adequately.
Example
Poor: The student will see the importance of education (implicit objective)
Better: The student will be able to identify three major importance of education
(instructional objective)
4. Objectives should be stated in unambiguous terms. The previous example, "Practices
good citizenship," is poor because the word "good" can mean different things to different
people. To one teacher, it may mean that the students are willing to serve on the student
council. To another teacher, it may mean that the students vote in school elections.
5. Objectives should be unitary; that is, each statement should relate to only a single process.
When we state objectives, we should include only one behavior. Compound objectives,
objectives that include two or more behaviors, are likely to lead to inconsistent
measurement. At the beginning of a course a teacher may have a particular objective,
such as, “The student should be able to recall, comprehend, and apply the four major
corrections-for-guessing formulas.” But any of these three behaviors (recall, comprehend,
and apply) may measure the achievement of this objective. In addition, those selected
may or may not be measured in proportion to the emphasis given to them in class. If the
resulting test is not responsive to the instructional objectives, it is invalid for determining
whether these objectives have been accomplished. Another shortcoming of compound
objectives is that the easier portions of the objectives may be measured because it is
easier to write recall (knowledge) items than application items. Again, the relevance of
the test is destroyed.

46
Dear learner! Note that don’t state objectives in terms of teacher performance
(e.g., teach parts of microscope), learning process (e.g., pupil learns parts of
microscope), course content (e.g., student studies parts of microscope), two objectives
(student lists and explains parts of microscope).

3.6 Taxonomy of Educational/Instructional Objectives

Dear learner, in previous sections you learned about instructional objectives and their
importance in the teaching and learning process. In the present section, you are going to learn
about what is called Taxonomy of Educational Objectives. This is a useful guide for
developing comprehensive list of instructional objectives. It attempts to identify and classify
all possible educational outcomes. The taxonomy is classified into three major domains:
cognitive domain (Bloom et al., 1956), affective domain (Krathwohl, 1964) and psychomotor
domain (Simpson, 1972). Hence, below this the levels in each domain are treated in-depth.

Dear learner, in your previous courses (Subject Methods of Teaching and Secondary
School Curricula) you learned about Bloom’s taxonomy of instructional objectives. Could
you elaborate the different forms of expectations teachers have on students in each of the
levels across the three domains?

A. Cognitive Domain
Cognitive domain which is popularly known as Bloom’s (1956) Taxonomy of Educational
Objectives, is the most renowned descriptions of the levels of cognitive performance.
Objectives written in this domain expect students to develop their intellectual abilities and
skills like recalling, understanding, problem solving, perception, reasoning etc. The levels of
this taxonomy are considered to be hierarchical. That is, learners must master lower level of
objectives first before they can build on them to reach higher level objectives. The levels of
the Taxonomy and examples of verbs and instructional objectives are given hereunder.

47
1. Knowledge
Knowledge is defined as the remembering of the previously learned material. This may
involve the recall of a wide range of material, from specific facts to complete theories, but all
that is required is the bringing to mind of the appropriate information. It represents the lowest
level of learning outcomes in the cognitive domain.

At this level students are expected to simply recall (remember) the facts as they were written
on books or as told by the teacher. Items in knowledge category involve the ability to recall
important information like knowledge of specific facts, basic concepts, principles, methods
and procedures, definitions of important terms, familiarity with important concepts etc. Thus,
knowledge level questions are formulated to assess previously learned information:
At the end of this class, the student is expected to:
• List down six levels of Bloom’s taxonomy of the cognitive domain.
The above statement can serve as both a test question and an objective that simply demands
students to recall the hierarchies as taught without any further mental operation. Some of the
action verbs that are used in this level are: cite, define, identify, label, list, match, name,
recognize, reproduce, select, state, indicate, locate, recall, repeat, state tell, underline

2. Comprehension
At this level, students are basically expected to find out the gist or essence of a given concept.
Hence, it requires students to show skills in summarizing, elaborating, translating the concept
from one form of language to another etc. The common cognitive processes required at
comprehension level are translation, interpretation, and extrapolation. These learning
outcomes go one step beyond the simple remembering of material and represent the lowest

48
level of understanding. For example - Interprets charts, graphs etc. The following
instructional objective statement clarifies it:

At the end of this class, the student is expected to:


• Explain the difference between measurement and evaluation in his own terms.
Some of the action verbs that are used in this level are: classify, convert, describe,
distinguish, explain, extend, give examples, illustrate, interpret, paraphrase, summarize,
translate, restate, tell in your own words etc.

3. Application
Application refers to the ability to use learned material in new and concrete situations.
Learning outcomes in this area require a higher level of understanding than those under
comprehension. In this level of the cognitive domain, students are expected to transfer use of
a rule, principle, formula, procedure, etc to a context different than one originally used. It
requires mastery of a concept well enough to recognize when and how to use it correctly in
an unfamiliar or novel situation. The fact that most of what we learn is intended to be applied
to problem situations in our everyday life demonstrates well the importance of application
objectives in the curriculum.

For example, accounting teachers show to their students how to calculate the profit rate of a
given firm using a hypothetical data. After graduation, when these students are employed in a
business firm, they are expected to calculate the profit rate of that firm. The difference
between the college case and that of the business firm is a context while the nature of the
problem to be solved being the same i.e., profit rate in both cases. Some of the action verbs
that are used at this level are: apply, arrange, compute, construct, demonstrate, discover,
modify, operate, predict, prepare, produce, relate, show, solve, use

4. Analysis
Analysis refers to the ability to break down material into its component parts so that its
organizational structure may be understood. This may include the identification of parts,
analysis of the relationships between parts, and recognition of the organizational principles
involved. Learning outcomes here represent a higher intellectual level than comprehension
and application because they require an understanding of both the content and the structural
form of the material.

49
For example, after the end of a football game, sport channels present the number of yellow
and red cards given, percentage of ball possessions, number of on and off-target shots,
number of goals and off-sides that each team had. The single game is broken-down to the
aforementioned foundational units to make a clear-cut comparison between the two teams’
performances. This is called football analysis.

In the analysis level of cognitive domain, students are expected to show how a complex
whole is formed from its fundamental constituent elements. Therefore, in general students are
expected at this level to show relationships, make comparison, categorise units to groups and
break a whole to its building blocks.
At the end of this class, the student is expected to:
Show how the different units (departments/colleges/faculties) in Bahir Dar University operate
in unitary manner.
Some of the action verbs that are used at this level are: analyze, associate, determine,
diagram, differentiate, discriminate, distinguish, estimate, infer, order, outline, point out,
separate, subdivide etc.

5. Synthesis
Synthesis refers to the ability to put parts together to form a new whole. This may involve the
production of a unique communication (theme or speech), a plan of operations (research
proposals) or a set of abstract relations (scheme for classifying information). Learning
outcomes in this area stress creative behaviors, with major emphasis on the formulation of
new patterns or structures.

In biology, you learned the process that plants use to produce complex food chemicals using
simpler chemical molecules in the presence of sun light as photosynthesis. Similarly, at
synthesis level in the cognitive domain students are expected to produce a complex or whole
idea through combining simpler ideas or units. In addition to this, students are also expected
to produce novel or new ideas.
Example: At the end of this class, the student is expected to:
Combine the constructivist and behaviourist approaches to instruction can be used in
integrated manner.

50
Some of the action verbs that are used at this level are: Combine, compile, compose,
construct, create, design, develop, devise, formulate, integrate, modify, organize, plan,
propose, rearrange, reorganize, revise, rewrite, tell, write etc.

6. Evaluation
Evaluation is concerned with the ability to judge the value of material (statement, novel,
poem, research report) for a given purpose. Judgments are to be based on definite criteria.
These may be internal criteria (organization) or external criteria (relevance to the purpose)
and the student may determine the criteria or be given them. Learning outcomes in this area
are highest in the cognitive hierarchy because they contain elements of all of the other
categories, thus value judgments based on clearly defined criteria.

At this level, students are expected to criticize, evaluate or judge an action or idea using
internal (personally set) or external (set by others) standards.
At the end of this class, the student is expected to:
Justify whether using continuous assessment is advantageous in Ethiopian context.
Some of the action verbs that are used at this level are: Appraise, assess, compare, conclude,
contrast, criticize, discriminate, evaluate, judge, justify, support weigh etc.

 Activity
Direction I: Identify the level of cognitive domain which each of the following objectives
measure.
1. The student will weigh up whether modular approach is best for teaching science courses.
2. By the end of the class, the student will summarize the main events of the Troy passage.
3. Given a presidential speech, the student will be able to point out the positions that attack a
political opponent personally rather than the opponent’s political programs.
4. Given a short story, the student will write a different but plausible ending.
5. Given fractions not covered in class, the student will multiply them on paper with 85
percent accuracy.
6. From memory, with 80 percent accuracy the student will match each Ethiopian hero with
his most famous battle.

51
Direction II: Indicate the level of objective in the cognitive domain which each of the
following questions measure.
1. Assume that it is 20 degrees centigrade at the bottom of Mount Batu which is 100 m.
While other things being constant what will the temperature be at its peak which is 4307
m?
2. What is Education?
3. Why had Hitler killed more than 6 million Jews?
4. Which approach is best interms of its applicability to University Education, Behavioural
or Constructivist?
5. Define memory in your own terms.

Answers for Activity


Direction I
1. Evaluation 2. Comprehension 3. Analysis 4. Synthesis 5. Application 6. Knowledge
Direction II
1. Application 2. Knowledge 3. Analysis 4. Evaluation 5. Comprehension

B. Affective Domain
The affective domain included those objectives which are concerned with changes in
interests, attitudes and values and the development of appreciations and adjustments. In this
domain, students are expected to advance (change) their attitudes, values, emotional reactions
towards an idea, event, or individual. Achievement of objectives stated in this domain lead
formation of responsible, free, law abided, disciplined and dutiful student behaviour. Similar
to cognitive domain the objectives are stated at different hierarchies/levels from the simplest
behaviour to the most complex.
Most complex
Characterization

Organization

Valuing

Responding

Least complex Receiving (Attending)

52
1. Receiving (Attending)
Receiving refers to the student’s willingness to attend to particular phenomena or stimuli
(class-room activities, textbook, music etc.). Receiving represents the lowest level of learning
outcomes in the affective domain. In this stage, students are expected simply to give their
attention or vigilance to teachers or demonstrators action. This is because attention is at the
base of learning any skill or knowledge. Thus, students must stop doing any other activity
when the teacher shows and transmits any skill and knowledge respectively. See the
following example:
The student will listen actively when the teacher explains the difference between formative
and summative assessment.
Some of the action verbs that are used at this level are: follows, gives, holds, identifies,
locates, names, points to, selects, sits, erects, replies, uses.
2. Responding
Responding refers to active participation on the part of the student. At this level, the student
not only attends to a particular phenomenon, but also reacts to it in some way (i.e.,
willingness to respond, voluntarily reads beyond assignment; satisfaction in responding, e.g.,
reads for pleasure or enjoyment). The teacher in this stage, expects students to obey orders
i.e., to act according to verbal orders, respect rules and regulations of the school and follow
also classroom ground rules.
The following example will clarify this best:
The student will submit the assignments on the deadline.
Some of the action verbs that are used at this level are: comply with, obey, applaud, acclaim
etc.
3. Valuing
Valuing is concerned with the worth or value a student attaches to a particular object,
phenomenon or behavior. This ranges in degree from the simpler acceptance of a value to the
more complex level of commitment. Learning outcomes in this area concerned with behavior
that is consistent and stable enough to make the value clearly identified. Instructional
objectives that are commonly classified under attitudes and appreciation would fall into this
category. This includes acceptance of a value (desires to improve group skills), preference for
a value (assumes responsibility for the effective functioning of the group), and commitment
to a certain point of view (demonstrates commitment to social improvement).

53
Unlike responding level students in this stage are expected to freely express their values,
opinion or attitude towards a perspective, an object or person.
Example:
The student will be able to express his support or opposition on the nation’s stand against
religious fundamentalism.
Some of the action verbs that are used at this level are: differentiates, explains, initiates,
justifies, proposes, shares, etc
4. Organisation
For situations, where more than one value is relevant, the need arises for the organization.
Organization is concerned with bringing together different values, resolving conflicts
between them, and beginning the building of an internally consistent value system. Thus, the
emphasis is on comparing, relating and synthesizing values. Learning outcomes concerned
with:
• Conceptualization of each individual for improving human relations and
• Organization of a value system. (Develops a vocational plan that satisfies his need for
both economic security and social services)

At this stage students are expected to resolve contradictory or conflicting views, beliefs or
instance they had towards a perspective, an object or person. For instance, a student may live
without being conscious of having conflicting views of helping beggars i.e., it is a blessed
activity (from spiritual perspective) and a wrong one (from economic policy perspective).
Hence, the objective stated at this level expects the student to reconcile these contradictory
views.
Example: By the end of the class, the student will reconcile his spiritual and economic views
on helping beggars.
Some of the action verbs that are used at this level are: balance, organize, formulate,
accommodate, integrate, arranges, combines, generalizes, integrates, modifies,
5. Characterization
At this level of the affective domain, the individual has a value system that has controlled his
behaviour for sufficiently long time for him to have developed a characteristics life style.
Thus, the behavior is pervasive, consistent and predictable. Learning outcomes at this level
cover a broad range of activities but the major emphasis is on the fact that the behaviour is
typical or characteristic of the student. Instructional objectives that are concerned with the
student’s general pattern of adjustment (personal, social, emotional) would be appropriate

54
here. For example - demonstrates self-reliance in working independently maintains good
health habits and practices co-operation in group activities etc.

At this level students are expected to act out or practice what they believe. For instance, a
student who says that giving money to beggars strengthens their begging behaviour should
not give money. Therefore, unlike valuing level it does not expect students to simply express
or have view of themselves but expect them to show it practically.
Example: During peer evaluation time, the student will evaluate his team mates objectively.
Some of the action verbs used at characterization level are respect, interpret, use evidence,
maintain objectivity.

 Activity
Matching
Direction: Under column “A” statements representing different levels of the affective
domain are found and under Column “B” the different levels of the affective domain are
found. Therefore, indicate by selecting from column ‘B’ the letter that represents the right
level of objective that fits to each statement found under column “A”. Note that one response
may be used once, more than once, or not at all when relevant.
A B
1. Tries to practice attributes of effective
teacher which one values it. A. Receiving
2. Resolves his belief that keeping
students closer make them interested in
B. Responding
your course but at same time result
poor respect for you.
C. Valuing
3. Grade as per the legislation of the
university in one’s teaching
responsibility. D. Articulation
4. Believes that disable students can
succeed like able others if their needs E. Organization
are entertained
5. Follows the principal’s speech F. Characterization
seriously at school meetings.

Answers for Activity


Matching
1. F 2. E 3. B 4. C 5. A

55
C. Psychomotor Domain
The psychomotor domain includes those objectives which are concerned with manual and
motor skills. The domain includes physical movement, co-ordination and use of motor skills.
Development of the skills requires practice and is measured in terms of speed, precision,
distance, procedures, or techniques in execution.

Generally, in this domain students are expected to bring improvement in body-mind


coordination. Hence, teachers’ expectations range from simple replication of skills displayed
to students up to carrying out two or more motor activities smoothly and automatically. It
contains levels listed in order from simplest behavior to the most complex.
Most Complex
Naturalization

Articulation

Precision

Manipulation

Least Complex Imitation


1. Imitation
Objectives at this level require that the student to be exposed to an observable action and then
overtly imitate that action, such as when a teacher demonstrates use of the microscope by
placing a slide on the specimen tray. The performance is generally crude and imperfect.

At this level students are expected to observe and be able to repeat the action being visually
demonstrated by the teacher or the demonstrator. However, students are not expected to be
accurate but make simply an attempt.
An example is given below:
The student will assemble the mobile phone after observing the technician’s demonstration.
Some of the action verbs that are used at this level are: Adhere, begin, bend, assemble,
attempt, carryout, copy, calibrate, construct, dissect, duplicate, follow, mimic, move, practice,
proceed, repeat, replicate, reproduce, respond, organize, sketch, start, try, and volunteer.
2. Manipulation
Objectives at this level require the student to perform, or practice selected actions from
written or verbal directions without the aid of visual model or direct observation. Action

56
verbs are the same as that of the imitation level except that they are performed from spoken
or written instructions.

Similar to the imitation stage, students in this stage are not expected to be accurate in the
actions they acquire. However, unlike imitation stage students are expected to acquire a
physical skill through reading a written manual or listening to a verbal order.
For example:
The student will assemble the mobile phone listening to the instruction given by the
technician.
Some of the action verbs that are used at this level are: acquire, assemble, build, complete,
conduct, do, execute, grasp, handle, implement, improve, maintain, make, manipulate,
operate, pace, perform, produce, progress, re-create, use.
3. Precision
Objectives at this level require the student to perform an action independent of either a visual
model or a written set of directions. Students are expected to reproduce the action with
control and to reduce errors to a minimum. Different from the previously stated two levels, in
precision level students are required to perform certain physical activity with certain degree
of accuracy or correctness.
The following example clarifies this best:
The student will be able to assemble the mobile phone at least 3 times appropriately given 5
chances.
Some of the action verbs that are used at this level are: accurately, proficiently, with
balance, independently, with control,
4. Articulation
Objectives at this level require the student to display the coordination of a series of related
acts by establishing the appropriate sequence and by performing the acts accurately, with
control as well as with speed and timing. Beyond carrying out activities with higher level of
proficiency, students are also expected to complete tasks efficiently (in time).
Example: The student will be able to assemble the phone perfectly in less than 5 minutes.
Some of the action verbs that are used at this level are: confidence, smoothness, coordination,
speed, harmony, stability, integration, timing, proportion
5. Naturalization
Objectives at this level require a high level of proficiency in the skill or performance being
taught. At this level, the behavior is performed with the least expenditure of energy and

57
becomes routine, automatic, and spontaneous. Students are expected to repeat the behavior
naturally and effortlessly time and again. At this highest stage, students are demanded to
carry out two or more activities correctly showing no feeling of strain. Hence, in this level
activities are done as if they are acquired naturally and automatically. This results usually
from long period of practice.
Some of the action verbs that are used at this level are: automatically, spontaneously,
effortlessly, with ease, naturally, with perfection, professionally, with poise, routinely.

UNIT SUMMARY
The major points that have been discussed in the unit are summarized below.
• In this chapter, we discussed about objectives. Objectives refer to our instructional intent
in terms of the types of behaviour or performance pupils are expected to demonstrate as a
result of instruction. Objectives can be stated at different levels like national, institutional,
and curricular or lesson levels. To this end, we have global, educational and instructional
objectives.
• Bloom and his associates classified learning objectives based on three domains: cognitive,
affective and psychomotor domain. The cognitive domain emphasizes the attainment,
retention, and development of knowledge and intellect. This domain is divided into six
levels. They are knowledge, comprehension, application, analysis, synthesis, and
evaluation. There are specific terms that are used for each level of the domain. The
affective domain encompasses those behaviors characterized by feelings, emotions,
attitudes, interests, personality, and values. This domain has five levels: Receiving,
responding, valuing, organization, and characterization. The psychomotor domain refers
to muscular or motor behaviors. Running, using tools, speaking, and handwriting are
classified as psychomotor activities.
• All these domains are important in educational process. However, the most widely
emphasized domain in school curriculum is the cognitive domain.

58
SELF - TEST CHECKLISTS
It is the time to see your understanding of educational objectives. Read the following items
and answer them by checking in one of the boxes under alternatives “Yes” or “No.”
Yes No
Can you express the roles of objectives in education?  
Can you state specific instructional objectives?  
Can you identify different types of objectives?  
Can you classify objectives in relation to Bloom’s taxonomy of
 
objectives?
Is there any box that you marked “No” under it? If your answer is “Yes” go back to your
text and read about it.

SELF-TEST EXERCISES
Dear student, attempt the following questions, so see whether you have been able to
understand Educational Objectives.
1. Explain the Bloom’s taxonomy of educational objectives.
2. What are the different levels or forms of educational objectives? Explain with examples.
3. State the relationship among objectives, teaching learning activities and assessment with a
suitable example.
4. The student volunteers her answer by raising her hand in class. Which domain is best
illustrated by this students’ behavior?
5. Let’s assume that you are a physical education teacher. You want your students to think
that health and fitness are important and should be a part of their life style. In which
domain of Taxonomy of Educational objectives would your goal would be best
classified? Justify.
6. To what extent are the course objectives you learnt directly related to the assessment
types your instructors were using to measure your learning progress?
7. How frequently were you assessing your students progress to ensure whether the
objectives were achieved or not?
8. Have you ever thought of the objectives of the course(s) you learn during the learning
process and when you study in preparation for exams?

59
UNIT FOUR
DEVELOPING CLASSROOM ACHIEVEMENT TESTS
INTRODUCTION
Dear learner, now you are approaching to the point to introduce yourself with the basic
principles of writing classroom test items. This unit deals with how classroom tests are
prepared by addressing the preparation of table of specification, selection of test format and
factors to be considered when selecting item format. In this unit, major concern will be given
on the procedures to be followed for developing valid and reliable test items. In addition to
these, the common pitfalls that teachers have regarding assessment, interpretation and process
will be discussed to serve as spring board for subsequent discussions. Lastly, some good
practices that improve teachers’ skill in preparing achievement tests will be indicated.

Unit Objectives
By the end of this chapter, you are expected to:
• Explain the procedures to be followed in developing valid and reliable tests.
• Design table of specifications in accordance with instructional objectives.
• Construct test items that evaluate the appropriate level of learning outcomes.
• Indicate most frequently observed limitations in teacher made tests.
• Identify factors to be considered when determining the type and size of items to be
prepared from different content and objective areas.
• Understand distinctions between a variety of achievement test items, their
characteristics and appropriate usage of each item as distinguished from the others.
• Recognize best practices that improve teachers’ skill in writing good test items

What type of assessment teachers commonly use to evaluate their students’ academic
achievement in your school? What are some of the major problems you observed in
classroom tests prepared by subject teachers? Mention.
Dear learner, as you know, a test is the major and most commonly used instrument for the
assessment of cognitive behaviours. Thorndike and Hagen (1977) observe that central to
school’s evaluation process are teacher made tests. Such instruments are designed to appraise
the outcomes of classroom instruction. A test designed to appraise what the individual has
learned as a result of planned previous experience or training is an achievement test. Since it
relates to what has been learnt already and its frame of reference is on the present or past.

60
Dear learner, classroom achievement test is the test with which every teacher is familiar and
has to construct it to judge the achievement of his /her students. If the test is well written and
covers the entire course it will be a better measure of student’s achievement and could
discriminate between good and poor learners. The construction of good tests requires specific
skills and experience, which are neither easy to acquire nor widely available. A number of
problems keep classroom tests from being accurate measures of students’ learning.

Some Pit Falls in Teacher – Made Tests


1. Most teacher - made test are not appropriate to the different levels of learning outcomes.
Tests include too many questions measuring only knowledge of facts and ignore
important higher-level objectives in the cognitive domain.
2. Most tests prepared by teacher lack clarify in the wordings. The questions of the tests are
ambiguous, not precise, and not clear and most of the times carelessly worded. This
occurs as questions are written in an overnight or fire extinguishing manner and hence
editing is made poorly.
3. Some classroom tests do not cover comprehensively the topics taught. One of the
qualities of a good test is that it should represent all the topics taught. But these tests
cannot be said to be a representative sample of the whole topic taught. The tests are too
short to provide an adequate sample of the body of content to be covered.
4. Too little feedback is provided. If a test is to be a learning experience, students must be
provided with immediate and descriptive feedback about which of their answers were
correct and which were incorrect.
5. Many of the test exercises fail to measure what they are supposed to measure. In other
words, most of the teacher-made tests are not valid. Validity is a very important quality of
a good test.
6. Most teacher-made tests fail item analysis test. They fail to discriminate properly and not
designed according to difficulty levels.
In order to reduce the possible occurrence of one or more of these problems, it is important to
develop tests in systematic manner. However, there is no algorithm or magical formula that
gives a test with a definite level of quality. But following, the following simple heuristic
(procedure) may result development of better-quality tests.

61
What should be done to prepare tests before they are written and before they are ready
for administration? Did you say anything related to planning? If so, that is right.
Planning is the first step in preparing tests.

4.1 Planning for Classroom Test Construction


Dear learner, note that a good test reflects the goals of the course. It is congruent with the
skills that you want students to develop and with the content you emphasize in the class. A
test that covers a much broader range of material than that covered in the class will be
regarded as unfair by your students, even if you tell them that they are responsible for
material that has not been discussed in class.

The key element to effective achievement testing is careful planning. Test planning includes
several activities that need to be carried out by the teacher to devise a new test. It provides
greater assurance that the test will measure relevant learning outcomes. According to
Worthen, White, Fan, and Sudweeks (1998) development of teacher made tests passes
through the following steps:

Step 1: Define the Purpose of the Test


The purpose of the test will determine the kind of test to be used. This in return will
determine breadth and depth of the test coverage, item difficulty, item size, the score
reporting and interpretation etc. For instance, if the purpose of the test is to make decisions
whether or not examinees can demonstrate mastery in a given area of content and
competencies then criterion referenced tests become appropriate. But if it is to rank the entire
set of examinees in order to make comparisons of their performances relative to one another
norm-referenced testing becomes the appropriate choice.
Step 2: Prepare Table of specification
What is a table of specifications?
This is a two-way chart which relates the instructional objectives to the course content and
specifies the relative emphasis to be given to each type. The purpose of the table of
specification is to provide assurance that the test will measure a representative sample of the
learning outcomes and the subject matter topics to be measured.
The use of Table of specification/Test-blue print
Generally, the use of table of specification or test blue print in test development will help
ensure that;

62
A. only these objectives actually pursued in instruction will be measured
B. each objective will receive the appropriate relative emphasis in the test.
C. by using subdivisions based on content and behaviours, no important objectives
will be overlooked or misrepresented.
The following example will clarify the components of table of specification and its
development process. Preparing table of specification involves:
A. Obtaining a list of instructional objectives treated in the class
This describes the type of performance the pupils are expected to demonstrate. It reminds the
teacher to consider the instructional objectives when planning the classroom tests. It is a
comprehensive list of instructional objectives and specific learning outcomes which has been
prepared; it is simply a matter of selecting those outcomes that can be measured by paper-
and- pencil tests. If such list is not available, a set of instructional objectives can be prepared
for the classroom test. Some examples of instructional objectives and learning outcomes are
given below.
At the end of the course, the student will be able to:
(The objectives section comprises instructional objectives and samples of learning
outcomes drawn from it.)
1. Knows the concept of assessment and evaluation of learning;
• define the word test, measurement, assessment & evaluation
• explain the difference between assessment and evaluation.
2. Understand the role of educational objectives in assessment and evaluation of
learning;
• express the role of educational objectives in assessment and evaluation
• identify general objectives form specific objectives
• list the domains of educational objectives and their levels.
3. Apply the principles of assessment of learning;
• identify the principles of assessment and evaluation of learning.
• show the application of the principles of assessment in the local context.
B. Outlining the Instruction Content Areas to Class
The second step in preparing the test specifications is to make an outline of the course, or a
more detailed list of topics and subtopics. The amount of detail to include in the content
outline is somewhat arbitrary but is should be detailed enough to ensure adequate sampling

63
during test construction and proper interpretations of results. Some examples of content
outlines are given from this course (Educational Assessment of Learning) as follow:
Content outlines
- Definition of basic terms
- Purpose of assessment in education
- The role of educational objectives in assessment and evaluation
- Principle of assessment and evaluation of learning
C. Preparing the two-way grid/chart
The final step in building a table of specification is to prepare the two – way chart by
listing the major content areas down the left side of the table and objectives horizontal to
the right side of the table that relates the instructional objectives to the course content and
this specifies the nature of the test sample. It depicts how many questions are to be tapped
from each content or objective is listed down.

Sample of a table of specification in Assessment and Evaluation of learning course for


6th year summer Students, final examination, 2022
Instructional Objectives
Understanding

Contents
Application
Knowledge

Evaluation
Total
Synthesis
Analysis

Definition of terms 3 2 2 - - - 7
Purpose of assessment in education 2 5 4 1 - - 12
The role of educational objectives in assessment 3 6 3 1 - - 13
Principle of assessment 2 3 3 - - - 8
Total 10 16 12 2 - - 40
• Notice that there is an increasing complexity from left to right in the types of objectives
to be measured, that is, test items include simple knowledge of facts and higher learning
objectives such as synthesis and evaluation. Performance that demonstrates knowledge
or comprehension, for example, is lower level of cognitive domain. Those reflecting the
ability to synthesize or evaluate are higher level cognitive performance.
• The relative emphasis to be given to each instructional objective and each content area
should, of course, reflect the emphasis given during the instruction. In assigning relative
weights attached to the learning outcomes and the amount of instructional time devoted
to it can serve as guidelines.

64
Dear learner, which of the objectives on the example above were given more emphasis?
Which of them obtained least emphasis? Which content areas obtained the highest
representation? Which one obtained the least representation? What are the implications of
these differences? What factors should be considered for determining how many items from
each content and objective area?

You need to determine what proportion of the test items should be devoted to each objective
and each content area. Here, a number of factors should be considered:
1. How important is each area in the total learning experience?
2. How much time was devoted to each area during instruction?
3. What relative importance do curriculum specialists assign to each area?

Contents (chapter) that are wider in scope, in which longer hours of instructional time
are devoted, which serve as foundation for learning subsequent chapters/courses and have
implications for measuring third variables should get much focus.
Example, Sample of a table of specification in Assessment and Evaluation of learning
course
Objectives →
Understanding

Content
Application
Knowledge

Evaluation
Syntheses
Analysis

Total
Definition of 2 fill-in-the 1 essay 4%
terms blanks (2 points each)
(1point each)
Purpose of 1 multiple 4 multiple - 5%
assessment and choice (1 point choice (1 point
evaluation each) each)
The role of 2 multiple 1 essay - - - - 7%
objectives choices (1 point (5 points each)
in education each
Types of Tests 5 matching (1 3 multiple - - - - 8%
point each) choice (1 point
each)
Table of 1 multiple 1 essay 6%
specification choice (one (Preparing table of
point each) specification (5
points each)
Total 11% 14% 5% - - - 30%

65
• Typically, the weighting is done by first assigning percentage across the bottom row (for
content areas) and finally allotting the percentages, or number, or test items to each of the
two – way cells within the table. Proper weighting will make it possibly representative
sample of the intended learning outcomes of instruction that can be evaluated by paper
and pencil tests.

Step 3: Selecting Appropriate Item Format


Once the number of questions to be tapped from each objective and content area is
determined, the teacher can pass to the selection of appropriate item format. This decision can
be made considering the use, advantages, and disadvantages of each item format. A detailed
discussion of these elements of each item format is made in the subsequent chapters
scrambled into selected-response and constructed response item formats. Furthermore, the
following criteria may also serve as a selection criterion:
A. The purpose of the test
This is the primary criteria referred when one considers as to whether using a True/False or
essay item is important. For instance, if the purpose of the test is to check whether students
have shown improvement in their writing skills, it is appropriate to use essay. But if the
purpose of the test is to check whether students know the meaning of a given vocabulary it is
appropriate to use objective item formats.
B. The time available to prepare and score the test
If the teacher has less time for scoring, selected response item formats are appropriate as
students present their answers by selecting letters that represent their correct answers.
However, if the time shortage is serious for preparing items constructed-response ones are
better as they don’t demand the teacher to produce three or more plausible alternatives.
C. The number of pupils to be tested
If there are large numbers of examinees, selected-response ones are better as scoring will be
tedious and time taking when constructed ones are used.
D. The physical facilities available for reproducing the test
If there is serious shortage of typing, duplicating, stippling and packing resources, it is
important to use constructed response item formats which test items can be written on the
board and students provide response on separate sheet of blank paper.
E. Age and other characteristic of students
Selected-response items are preferable for younger children who have poor muscular,
reasoning and linguistic development.

66
F. Your skill in constructing the different types of items.
How competent you are in writing items using different item formats are also important
criteria to be referred while selecting item formats. Use the item format that you are
competent enough one than others.
Step 4: Write and pilot test the initial draft of the test
This is the last step in planning teacher made tests. At this stage, the teacher writes the test
item, improve them using comments from self and colleagues and try out it on sample of
students.
4.2 General Principles of Writing Teacher Made Tests
Different types of questions can be devised for an achievement test, for instance, multiple
choice, fill-in-the-blank, true-false, matching, short answer and essay. Although each type of
question is constructed differently, the following principles apply to constructing questions
and tests in general. Being cognizant of the following general principles of developing
classroom tests and reflecting them into our test practice helps to improve quality of the
items.
1. Directions to the examinees should be as simple, clear, and precise as possible, so that
even those students who are of below average ability can clearly understand what they
are expected to do.
2. Questions must be written in simple language. If the language is difficult or ambiguous,
even a student with strong language skills and good vocabulary may answer it
incorrectly if his/her interpretation of the question is different from the author’s
intended meaning.
3. Test items must assess specific ability or comprehension of content developed during
the course of study. Write the questions as you teach or even before you teach, so that
your teaching may be aimed at significant learning outcomes.
4. Devise questions that call for comprehension and application of knowledge skills rather
than emphasising rote learning. Some of the questions must aim also at appraisal of
examinees’ ability to analyze, synthesize, and evaluate novel instances of the concepts.
This will instigate students to engage in critical, creative and problem-solving practices.
5. Questions should be written in different formats, e.g., multiple-choice, completion,
True-false, Short answer, essay etc. to maintain interest and motivation of the students.
Above all, it is not possible also to measure the achievement of all objectives found at
different levels using a single item format.

67
6. The items should be phrased so that the content rather than the format of the statements
will determine the answer. Sometimes the item contains “specific determiners” which
provide an irrelevant clue to the correct answer. For example, statements that contain
terms like always, never, entirely, absolutely, and exclusively are much more likely to
be false than to be true. On the other hand, such terms as may, sometimes, as a rule, and
in general are much more likely to be true. Besides, care should be taken to avoid
double negatives, complicated sentence structures and unusual words.
7. Items pertaining to a specific topic or of a particular type should be placed together in
the test. Such a grouping facilitates scoring and evaluation. It will also be helpful for the
examinees to think and answer the items similar in content and format in a better
manner without fluctuation of attention and changing the mindset.
8. Scoring procedures must be clearly defined before the test is administered.
9. Item analysis should be carried out to make necessary changes, if any ambiguity is
found in the items.

4.3 Qualities of Good Item Writer

Dear learner, as a secondary school teacher, what a teacher needs to have to be a good
item writer? Mention them.
The process of writing good test items is not simple - it requires time and effort. It also
requires certain skills and proficiency on the part of the item writer. Some of which can be
improved by formal course work, others require considerable practices. To be a good item
writer one should be proficient in the following six areas.
1. Know the subject matter thoroughly
The greater the item writer’s knowledge the subject matter, the greater the likelihood that
she/he will know and understand facts and principles as well as some of popular
misconceptions. This latter point is of considerable importance when writing the selection
type of item in general, and the multiple-choice item in particular (because the item writer
must supply plausible although incorrect answers).
2. Know and understand the students being tested
The kinds of students the teacher deals with will determine in part the kinds of item format,
vocabulary level, and level of difficulty of the test items. For example, primary school
teacher seldom uses multiple choice item because young children better able to response to
the short answer type. The vocabulary level used for class of gifted children may be very

68
different from that used with a class of educable but mentally retarded children. The
classroom teacher who knows and understands his/her students will generally establish more
realistic objectives and develop a more valid measurement device than will the teacher who
fairly to consider the characteristics of her/his students.
3. Be skilled in verbal expression
It is essential that the item writer clearly coveys (presents) to the examinees the intent of the
question. In an oral examination, the student may have the opportunity to ask for and receive
clarification of the question when he/she does not understand what the teacher is asking. But
in paper-pencil test this is less possible. Hence, the items should be clearly written and to do
that the teachers should have the skill to use the language instruction.
4. Be thoroughly familiar with various item format
The item writer must be knowledgeable of the various item formats – their strengths and
weaknesses, the error commonly made in this and that type of item – and guidelines that can
assist her in preparing better test item.
5. Be preserving (try hard to write and improve)
Writing good test items, regardless of their format, is both an art and a skill that generally
improves with practice. There are very few professional item writers who are so gifted, able,
and blessed to write items that requires absolutely no editing or rewriting. Depending upon
the skill of the item writer, the number of the items that need rewording or will be rejected
will vary. The important thing is that classroom teachers who are trained as teachers rather
than as item writers should be preserving and not give up, even though the task seems
demanding.
6. Be creative
Item writing needs creativity. The teachers’ ability of writing items in a novel way is very
crucial. Some mistakes in test writing are observed due to carelessness on the side of
teachers. The writer has inserted his/her creativity, way of explanation, use of examples in
writing.

 Activity
1. What are the uses of preparing table of specification?
2. What elements are included in the table of specification?
3. What determines the number of items to be constructed from each instructional objective
and each topic of course content?
4. What will happen if you do not prepare table of specification while constructing a test?

69
5. Summarize the general principles in constructing teacher made tests.
6. What do you think is the disadvantage of using only one type of test format? Why?
Explain briefly.

UNIT SUMMARY
Dear student, in this unit we have learned about planning classroom test items. The following
points have been emphasized in the unit.
• In the planning stage of a test, the teacher should consider elements like defining the
purpose of the test, preparing table of specifications, selecting appropriate test item
format, and developing initial draft of the test.
• Table of specifications (test blueprint) helps the teacher to ensure that only those
objectives actually pursued in instruction will be measured; that each objective will
receive the appropriate weight relative emphasis in the test; that by using subdivisions
based on content and behavior; no important objectives will be overlooked.
• When the teacher selects test format, he/she should consider the following factors: the
purpose of the test, the time available to prepare and score the test, the number of students
to be tested, the physical facilities available for reproducing (duplicating) the test, age of
students and teachers’ skill of item writing.
• Beyond this, being cognizant of some of the general principles in developing teacher
made tests and reflecting them in our practice help to improve the quality of tests we
produce. In general, these principles are related to keeping items free from language
errors (like vagueness and ambiguity) and guessing clues, giving clear and specific
directions, preparing items that measure diverse levels in the cognitive domain and using
varieties of item formats.
• To write good items the item writer should have the following qualities. These are
knowing the subject matter thoroughly, know and understand the students being tested, be
skilled in verbal expression, be thoroughly familiar with various item format, be
preserving (try hard to write and improve) and be creative.

70
SELF TEST CHECKLISTS
It is the time to see your understanding of preparation of classroom test. Read the following
items and answer them by checking in one of the boxes under alternatives ‘” Yes” or “No”.
Yes No
Can you identify the elements of table of specification?  
Can you prepare table of specification?  
Can you list factors to be considered when selecting item format?  
Can you summarize steps of item writing in your own words?  
Can you explain some of the principles of test item writing?  
Is there any box that you marked “No” under it? If your answer is “Yes” go back to your
text and read about it.

SELF TEST EXCERCISES


1. Select a topic (s) from your subject of specialization and prepare a table of specification
that will guide you to assess the achievement of your students on the topic(s).
a) Outline the topics/subtopics of you have chosen. If the syllabus does not
provide such information, you can develop with the aid of the teacher’s guide
or the textbook.
b) For each of the topics you have selected, write general objectives and
corresponding specific objectives to be attained by the learners at the end of
the instruction.
c) For each of the objectives you have developed, classify them on the basis of
the domain (cognitive, affective or psychomotor) to which they are related to.
2. Prepare a table of specifications for the topics you have selected. A sample design for
appropriate table of specifications is provided on the course module.
3. Develop actual test items (questions) of different item format which conform to the
guidelines specified in test construction.

71
CHAPTER FIVE
WRITING SELECTED RESPONSE TEST ITEMS

INTRODUCTION
Dear learner, in the previous section you studied about planning classroom tests. We hope
you are familiar with steps in planning classroom tests and principles guiding the selection of
one format of an item over another. In this unit, you will read about and understand the ways
test developers follow in the writing up of selected response test items of different kinds. To
produce quality items, one needs to know the principles and guidelines to write them. In this
module, you will examine different types of selected response test item formats.

In schools, it has been dominant tradition of using objective (selected or structured response)
items for assessing students’ learning. However, construction of quality selected response
items has been a challenge for teachers. Thus, the focus of this chapter is to help you improve
your skills in writing such items. Hence, in this chapter a detailed discussion is made
regarding the nature, use, advantages and limitations of selected response item formats.
Beyond this, guidelines for constructing better selected-response items will be offered.

Unit Objectives:
By the end of this chapter, you are expected to:
• Understand the nature of each select response type items.
• Know what each of the item formats in the selected response items measure.
• Clarify the meaning and use of context dependent items.
• Indicate its basic difference of context dependent items from other objective items
formats
• Understand distinctions between a variety of items, their characteristics and
appropriate usage of each item as distinguished from the others.
• State the advantages and limitations of each of the selected response item formats.
• Be aware of the criteria while developing good items in each of the item formats.
• Evaluate the qualities of selected response items developed by others relying on the
criteria for constructing items.

72
Dear learner, give different types of tests you introduce in your class. Is there any
difference between selected response item and extended response item type tests? Where does
their difference lie? Do you try to give the answers for these questions? If so, that is great.

Based on the kind of response required from the learner, there are two forms of test items:
selected response test items and constructed-response test items. Selected response test
items are the most familiar form of assessment, in which the learner is asked to select the
correct/best response from a set of specified alternatives. Because the learner chooses an
option rather than producing his own answer. Such test item formats include multiple-choice,
matching, and true-false items.

Alternatively, a test item can require a student to develop his or her own answer in response
to a question, stimulus, or prompt. An assessment of this form, such as one that requires
supply type (short answer and completion items) and essay or a step-wise problem solving
process is called constructed-response test item. Because they are used in alternative to
selected-response assessment they are also called alternative assessment.

Dear learner, have you distinguished the difference between selected response and
constructed response item format?

The cognitive capabilities required to answer selection items are different from those required
by supply items, regardless of content. In principle, both selection and supply items can be
used to test a wide range of learning objectives. In practice, most people find it easier to
construct selection items to test recall and comprehension and essay items to test higher-level
learning objectives. A major decision is the relative percentage to be assigned constructed-
response items (essay and completion) as opposed to selected-response items (multiple-
choice, true-false, and matching). In practice, teachers may often find the best tests in many
subjects will contain both selected- and constructed-response items.

Selected response tests are appropriate when:


i. In general, should always be used when outcomes dealing with the medium and low
levels of Bloom’s taxonomy are being evaluated.
ii. The group to be tested is large and the test may be reused.
iii. Highly reliable scores must be obtained as efficiently as possible.

73
iv. Impartiality of evaluation, fairness, and freedom from possible test scoring influences are
essential.

Advantages and limitations of selected-response assessment:

What advantages do you think selected response item formats provide?


In general, item formats that belong to selected-response assessment offer the following
advantages and limitations.
Advantages:
- Significantly reduces marking time as students represent their answers with letters or
simply mark the options and scoring is less likely to be biased, and measurement becomes
more efficient.
- If properly developed, they can be used to measure intermediate-level outcomes - to
comprehend, apply and analyze.
- Speedy assessments as students don’t take time to write their answers but simply select
letters that represent their answers. Require much less time for students to complete and
teachers to mark.
- The scope of test is broad as wider content areas are treated in the exam since more items
can be included on a selected-response test.
- Provision of automatic feedback to student particularly when used in computer-based
assessment.
- Potential for conducting assessment more frequently making monitoring learning
progress feasible.
- Questions can be pre-tested in order to evaluate their effectiveness.
Limitations:

What limitation do you think selected response item formats have?


In general item formats that belong to selected-response assessment experience the following
limitations:
• Significant amount of time is required to construct good questions.
• Writing questions that test higher order skills requires much effort as compared to
constructed ones.
• Cannot easily and directly assess written expression, logical presentation, creativity
and performance. Cannot be used to measure outcomes which require students to
generate ideas.

74
Dear learner, have you recognized the merits and demerits of selected-response item
format?

So far we have seen the general advantages and limitations that characterize selected-
response item formats. And below here the nature of each item format, its advantages,
limitations and guidelines for improving construction of each of the selected-response item
format are given.

5.1 True or False (Alternative Response) Item Format


5.1.1 Description about the True-False Item Format
As a student, from lower grades till this level you have been quite familiar with this item
format. In this item format, a statement will be given and students express their agreement or
disagreement to the truthfulness/correctness of the statement by choosing either of the two
mutually exclusive options. The mutually exclusive options can be given as True or False,
Correct or Incorrect, Yes or No, Right or Wrong, Valid or Invalid etc.

What type of learning outcomes do True-False items measure?


Most common use of the true-false item is in measuring the examinee’s ability to identify the
correctness of statements of fact, definitions of terms, statements of principles, and the like,
also the ability to distinguish fact from opinion. Another aspect of understanding that can be
measured by the true-false item is the ability to recognize cause-and-effect relationships.

5.1.2 Advantages and Limitations of True-False Items

What do you think is the greatest advantage of True/False item format over others?
Advantages
The following are some of the major advantages of true-false item formats:
1. Are comfortable for young children or pupils who are poor readers and writers.
2. It can cover a larger amount of subject matter in a given testing period than can any other
objective item. This increases the representativeness of the questions for the different
outcome or sub-units of a course guaranteeing good content validity level.
3. As indicated earlier/later misspellings, possibility of occurrence of multiple answers and
lack of legibility or neatness of students hand writings which appear during responding to
short answer and essay questions will be resolved in this item format. Hence, scoring is
quick, reliable, and objective

75
4. Are appropriate when there is lack of 3 or 4 plausible (equally attractive) distracters in
multiple choices.
5. If carefully constructed, it measures the higher mental processes of understanding,
application, and interpretation. For instance, this is an example developed from
application level in the cognitive domain.
T F: If x+3x = 9, then the value of x is 2.
To answer the above item the student must regress through the appropriate mathematical
algorithm.
Note: True-false items are most useful in situations in which there are only two possible
alternatives (for instance right, left, more, less, who, whom, and so on) and special uses
such as distinguishing fact from opinion, cause from effect, superstition from scientific
belief, relevant from irrelevant information, valid conclusions, and the like.

What do you think are the limitations of True/False item format?


Limitations
The following are some of the limitations of True/False item format.
1. Among the many limitation of this item format, the most visible one is higher
susceptibility to guessing effect. Any lay person without having a lecture or reading on a
given subject matter has a 50% chance of getting the correct answer with mere guessing.
This would have been impossible if the item format used was a supply type.
2. Like any other objective item format, cheating is highly probable as copying others’
answer written in T or F letters is very simple.
3. There are many instances when statements are not unequivocally true or false; and could
not be answered accordingly.
4. True-False items tend to rely heavily upon rote memorization of isolated facts, thereby
trivializing the importance of understanding those facts.
Note: A common criticism of the true-false item is that an examinee may be able to recognize
a false statement as incorrect but still not know what is correct. To overcome such
difficulties, some teachers prefer to have the students change all false statements to true.

5.1.3 Suggestions for Constructing True/False Items


Dear learners, we now see the suggestions by which teachers follow to develop quality true-
false items. Item construction requires technical and creative skills on the part of teachers.

76
Learning to write good items is important if the test is to be of value to students and teachers
to measure the required learning outcomes, i.e., students’ academic performance.
1. Tests significant contents of a course and avoid trivial statements.
Such items cause students to direct their attention toward memorizing details at the
expense of more general knowledge and understanding.
For instance, the important content in Educational Assessment and Evaluation course is
Benjamin Bloom’s taxonomy of instructional objectives rather than his ethnic
origin/race. Hence, the item stated below is trivial or less important:
T F: Benjamin Bloom had Jewish blood origin.
2. Write items that can be classified unequivocally as either true or false and if it is an
opinion or arguable theory recognise the source of theory or opinion. Look at the
following example:
Poor T F: Childhood experiences are foundations for adult personality.
This statement cannot be judged indisputably as right or wrong as there is disagreement
among psychologists over the childhood experiences as foundation. However, if it is
written citing the proponents or opponents of the perspective, it can be judged as right or
wrong like:
Better T F: According to Freud, childhood experiences are foundations for adult
personality.
3. Avoid taking statements verbatim or directly from textbooks. The rationale for this is,
in one way it pushes students to engage on rote learning rather than getting its gist. On the
other hand, the context under which the verbatim or statement used on the text (exercise)
book may not be available on the exam. For example:
Poor T F: The Square of the hypotenuse of a right triangle equals the sum of the
squares of the other two sides.
Better T F: If the hypotenuse of an isosceles right triangle is 7 inches, each of the
two equal sides must be more than 5 inches.
4. Include only a single major idea in each item and avoid compound statements as it
can neither be false nor true unless it measures cause-and -effect relationship.
An item based on a single idea is usually easier to understand than one based on two or
more ideas. It is not advisable to use items that contain two ideas. The problem here is
one idea may be false and the other may be true. The student will be in problem to decide
the answer for the item. To see this look at the following true false item.

77
Poor T F: Ethiopia is the oldest origin of human civilization and currently it is among
middle income countries of the world.
Better T F: Ethiopia is the origin of human civilization.
Better T F: Currently Ethiopia is grouped among middle income countries of the
world.
5. Avoid tricky questions. Tricky questions are those that cheat students through making
misspellings. However, since the purpose of exam is to check how much students
achieved instructional objectives rather than are cautious of being fooled by silly
misspellings, it is needed to be avoided.
Poor T F: The largest sports kit producer is Addidos.
In the above example, the teacher regards the student as wrong if he chose the True
option justifying that letter ‘’o’’ in the company’s name should have been ‘’a’’.
Better T F: The largest sports kit producer is Addidas.
6. Avoid use of ambiguous words.
For example look at the following true false items.
Poor T F: Large numbers of endemic animals are found in Awash National
Park.
Poor T F: Blood clotting takes place in a few minutes.
The above two true-false items cannot be unequivocally answered either true or false.
This is because the terms large and a few are not definite. To what extent should the
number of the animals be to be qualified as large and how fast the blood should be to be
qualified as taking a few minutes are not well defined. The same is true with the use of
words like some, few, a lot of, etc. Thus, it is suggested that instead of using these words
it is preferable to indicate numbers in items.
7. Avoid using absolute degree indicator terms like “always,” “all”, or “never’’ which
tend to make the statement false; relative degree indicators terms like “usually,”
“often,” “many” usually make the statement true.
T F. All Americans are educated.
T F. Most Americans are educated.
In the first example, the probability of the statement being true is less since there might
be at least one American that hasn’t got education may be because of disability or living
in remote area. In the second statement, the probability of being True is high as it is
realistic to expect this from a nation on which its economy and political life has been
established on knowledge. All in all these kinds of effective guessing by test wise
78
students emanates from living in world where truth is dynamic (context bounded) and
judging statements holding absolute degree indicators as True are unsound.
8. Avoid using negatively worded statements, especially double negatives. When a
negative word must be used, it should be CAPITALIZED, underlined or put in
italics or in bold so that students do not overlook it.
Poor: None of the steps in the experiment was unnecessary
Better: All of the steps in the experiment were necessary.
Example
Poor: Sigmund Freud was not the first person to identify the subject matter of psychology.
Better: Sigmund Freud was NOT the first person to identify the subject matter of psychology.
Best: Sigmund Freud was the first person to identify the subject matter of psychology.
9. Put the items in a random order so as to avoid response patterns that could serve as
clues (such as T, T, F, T, T, F). In addition to this, make the number of true and false
options to occur as correct answers equal times. This helps to minimize occurrence of
guessing by test wise students who tend to select one of the options when finding the
items difficult. They do this as they have the knowledge that you have the habit of
making one of the options as a correct answer.
10. Try to avoid long drawn-out statements or complex sentences with many qualifiers.
This will help students to understand what is asked and reduce language barrier. Be aware
that extremely long or complicated statements will test reading skill more than content
knowledge.
Example;
Despite the theoretical and experimental difficulties of determining the exact PH value of
a solution, it is possible to determine whether a solution is acid by the red color formed
on litmus paper when it is inserted in to the solution. (Poor)
Litmus paper turns red in an acid solution. (Better)
11. Avoid making items that are True consistently longer than those that are False or
vice versa. These clues result from teachers’ effort to make the statement that has to be
answered as True less ambiguous, arguable and vague. Hence keep sentence length
balanced between items that are to be answered as True and False.
12. The number of true statements and false statements should be approximately equal.
Constructing a test with an approximately equal number of true statements and false
statements will prevent response sets from unduly inflating or deflating the student’s
scores. Some students consistently mark statements “true” when in doubt about the
79
answer, where as others consistently mark them “false”. But the teacher should not
consistently use exactly the same number true and false items; this will provide a clue to
the student who is unable to answer some of the test items. The best procedure is to vary
the percentage of true-false statements somewhere between 40 and 60 percent. Under no
circumstances should the statements be all true or all false.

SELF CHECK EXCERCISES


See the following checklist for constructing good true false items. Read each of the following
questions and put a tick mark in the blanks under “Yes” or “No” column.
Yes No
Is this the most appropriate type of item to use?  
Have trivial items been avoided?  
Have tricky statements been avoided.  
Have negative statements, especially double negatives been avoided?  
Are the statements brief and specific?  
Have two independent ideas been avoided from a single item?  
Are opinions attributed to some source?  
Are ambiguous words avoided?  
Are specific determiners avoided?  
Are true statements and false statements approximately equal in length?  
Are true statements and false statements made approximately but not
exactly equal in number?  
Note: If you have ticked NO options, the items are poor and need revisions.

80
5.2 MATCHING ITEMS
5.2.1 Description about the Matching Item Format
In this section, we will learn about another commonly used type of objective type test.
Matching exercise consists of two parallel columns with each word, number, or
symbol in one column being matched to a word, sentences, or phrase in the other
column. The items in the first column for which a match is sought are called
premises, and the items in the second column from which the selection is made are
called responses. The student is asked to associate each premise on some basis with
response to form a matching pair.

What do you think are the uses of matching exercises? Can you list some of their
uses? Try to list some before you go to the following section.
Example:
Uses of Matching Exercises
1 When you have a number of questions of the same type (homogeneous), it is advisable to
frame a matching item in place of a number of similar multiple-choice questions.
2 Whenever learning outcomes emphasize the ability to identify the relationship between
two things and a sufficient number of homogeneous premises and responses can be
obtained, a matching exercise seems most appropriate.
Hence, it is suggested to use only homogeneous material in single matching exercise. For
example, Inventions and Inventers, Authors and Books, Scientists and their contributions.
5.2.2 Advantages and Limitations of Matching Item
Advantages;
• The major advantage of the matching exercise is its compact form, which makes it
possible to measure a large amount of related factual material in a relatively short time.
• Maximum coverage of objectives stated at knowledge level in a minimum amount of
space/time. Especially, suitable for who, what, when, and where questions.
• This efficiency occurs as both the premises and alternatives are in most cases stated at
word or phrase level. Hence, item sampling is higher.
• Are easy to score accurately and quickly.

81
Limitations;
• Impracticality of getting sufficient number of homogenous materials that is significant
from the viewpoint of our objectives and learning outcomes.
• It is generally restricted to the measurement of factual material based on rote learning.
• Usually poor for diagnosing student strengths and weaknesses.
• Difficult to construct since parallel information is required.
5.2.3 Suggestions for Constructing Matching Items
1. Use only homogeneous material in a set of matching items (i.e., dates and places
should not be in the same set) to reduce the possibility of guessing the correct answers.
A homogenous material is a set of questions and responses that focus on same basic
theme or idea. For instance, if you are asking about Ethiopian Olympic heroes and the
respective Olympic tournaments where they won medals, it should not be mixed with
names of Ethiopian war heroes (patriots) and the respective battle fields they defeated
aggressors. The reason for this is, the direction given in the matching exercise limits
students to select their responses on certain criteria like Olympic tournament settings
i.e. on the above specific case. Hence, they do not have a room to consider battle fields
as potent responses.
Direction: Match the names of cities that hosted Olympic tournaments given under
column B with Ethiopian Olympic heroes who won gold medals given under
column A. Write the letter that represents your answer on the answer sheet.
Column A Column B
1. Abebe Bikila A. Atlanta
2. Fatuma Roba B. Barcelona
3. Mirutse Yefitir C. Gura
4. Mamo Wolde D. Mexico
5. Derartu Tulu E. Moscow
6. Ras Alula Abba Negga F. Munich
G. Rome
H. Seoul
As the direction indicates Ras Allula Abba Nega was Ethiopia’s hero of war than Olympic
gold medalist. Therefore, the presence of his name here makes the item heterogeneous than
homogenous. Thus, the option Gura and his name must not be on this matching item.

82
Direction: In column A are five diagrams. In column B are their names of different figures.
Match the name of the figure with the diagram that best shows it by placing the
letter in Column B on the appropriate space under Column A. Each option under
Column B may be used once, more than one or not at all.
COLUMN A COLUMN B
1. A. Circle
B. Cone
2. C. Cube
D. Cylinder
3. E. Parallelogram
F. Pyramid
4. G. Rectangle
H. Sphere
5. I. Square
J. Trapezoid

This matching exercise is more homogenous than the above one. This is because it deals
with geometric figure and their names. The chances are very low to get the correct by
guessing.
2. Supply directions that clearly state the basis for the matching, indicating whether
or not a response can be used once, more than once, or not at all, and stating
where the answer should be placed. Ambiguity and confusion will be avoided. And
testing time will be saved.
3. Include an unequal number of responses and premises. If the lists are the same
length, the last choice may be determined by elimination rather than knowledge.
4. Place the shorter responses in Column B, i.e., on the right. Putting the shorter
responses in column B is time saving. This time saving practice allows the student to
read the longer item first in column A and then they search quickly through the shorter
options to locate the correct alternative.
5. Make sure that there are never multiple correct responses for one stem (although a
response may be used as the correct answer for more than one stem).
6. Arrange items in the response column in some logical order: alphabetical,
numerical, and chronological. If options are organized alphabetically or numerically,
students do not waste time searching for the correct response. This is especially
important if there are many options.
7. Use limited number of the items within each set. A brief list of items is advantageous
to both the teacher and the student. From the teacher’s standpoint, it is easier to

83
maintain homogeneity in a brief list. From the student’s view point a brief list enables
them to read the responses rapidly and without confusion. Approximately four to seven
items in each column seems best. There certainly should be no more than ten items in
either column.
7. Place all the items in one matching exercise on the same page. This will prevent
the disturbance created by the students switching the pages of the test back and forth.
It also will prevent them from missing the responses appearing on another page and
generally adds to the speed and efficiency of test administration.

SELF CHECK EXCERCISES


See the following checklist for constructing good matching exercise. Read each of the
following questions and put a tick mark in the blanks under “Yes” or “No” column.
Yes No
Is this the most appropriate type of item to use?  
Does the matching exercise deal with a homogenous material?  
Are the lists of items to be matched brief?  
Are shorter items placed in the response’s column?  
Are lists of responses arranged in logical order?  
Is the basis for matching clearly indicated?  
Are all items of the exercise placed on the same page?  
Note: If there is (are) NO option(s), the items developed might have problems. So, they need
revision or other they need to be rewritten.

5.3 MULTIPLE CHOICE ITEM FORMATS


Dear learner, in the previous sections you have learned about the nature, uses, advantages and
limitations, and the suggestions to be followed to develop different objective type items. In
this section, you will learn about the most commonly and widely used form of objective type
items: Multiple choice items. Throughout your schooling, you have a great deal of
experiences in taking multiple-choice items.

What do you think is the reason that multiple choice items are used most frequently
from kindergarten to higher learning institutions?
According to Ebel & Frisbie (1991), multiple choice items have long been the most highly
regarded and widely used form of objective test items. They are adaptable to the

84
measurement of most important educational outcomes of knowledge, understanding and
judgment, ability to solve problems, and ability to make predictions. Almost any
understanding or ability that can be tested by means of any other item form, short answer,
completion, true-false, matching or even essay, can also be tested by means of multiple-
choice items. This flexibility added with higher quality items usually found in the multiple-
choice form, has led to its extensive use in achievement testing.

5.3.1 Description about Multiple choice item Format


The multiple-choice item consists of two distinct parts:
1. The first part that contains the task or problem is called stem of the item. The stem of
the item may be presented either as a question or as an incomplete statement. The form
makes no difference as long as it presents a clear and a specific problem to the examinee.
2. The Second part presents a series of options or alternatives. Each option represents
possible answer to the question. In a standard form one option is the correct or the best
answer called the keyed response and the others are misleading or foils called
distracters.

Parts of multiple-choice items The stem or


The capital city of Ethiopia is ____________. the problem

A. Addis Ababa
B. Nairobi
Correct Answer
C. Khartoum
D. Mogadishu

Distracters or foils
Alternatives,
Choices, or Options

Direct Question: What is the capital city of Ethiopia?


A. Nairobi C. Addis Ababa
B. Khartoum D. Mogadishu
Incomplete Sentence: The capital city of Ethiopia is __________.
A. Nairobi C. Addis Ababa
B. Khartoum D. Mogadishu
The number of options used differs from one test to the other. An item must have at least
three answer choices to be classified as a multiple-choice item. The typical pattern is to have

85
four or five choices to reduce the probability of guessing the answer. A good item should
have all the presented options look like probable/plausible answers at least to those
examinees who do not know the answer.

Can you explain the similarities in the structure of Matching and Multiple-Choice
Questions?

Multiple choice items are considered better than all items that can be scored objectively
for the following reasons;
1. The multiple-choice item is the most flexible of the objective type items. It can be used to
appraise the achievement of any educational objectives that can be measured by a paper-
and pencil test except those relating to skill in written expression and originality.
2. An ingenious and talented item writer can construct a multiple-choice item to measure a
variety of educational objectives from rote learning to more complex learning outcomes
like comprehension, interpretation, application of knowledge and also those that require
the skills of analysis or synthesis to arrive at the correct answer.
3. Moreover, the chances of getting a correct response by guessing are significantly reduced.
However, good multiple-choice items are difficult to construct. A thorough grasp of the
subject matter and skillful application of certain rules is needed to construct good
multiple-choice items.
5.3.2 Advantages and Limitations of Multiple-Choice Items
Advantages,
The following are some of the advantages of multiple-choice item formats.
1. Learning outcomes from simple to complex with the exception of synthesis level can be
measured with this item format. Hence, it is the most flexible of all item formats.
2. Highly structured and clear tasks are provided.
3. A broad sample of items can be measured as scoring is simple.
4. Incorrect alternatives provide diagnostic information as students error can be identified
when they are attracted to one option repeatedly.
5. Scores are less influenced by guessing than true-false items since the options increase
from 2 to 4.
6. Scoring is easy, objective, and reliable than essay.

86
7. Avoids the absolute judgments found in True-False tests. This works particularly best
when students select the better option from the existing ones.

Limitations
The multiple-choice items, despite having advantages over other items, have some serious
limitations as well.
1. Constructing good items is time consuming particularly as finding plausible distracters is
frequently difficult.
2. As indicated at the advantage side, this format is ineffective for measuring some types of
problem solving skills and the ability to organize and express ideas. This is due to the fact
that real world problem solving involves proposing a solution versus selecting a solution
from a set of alternatives.
3. Scores can be influenced by reading ability. This happens when test developers include
difficult vocabularies. Hence, they place a high degree of dependence on the student’s
reading ability and the instructor’s writing ability.
4. There is a lack of feedback on individual thought processes. Common to other objective
items, it is difficult to determine why individual students selected incorrect responses.
4. Though less severe than True/False items, it still encourages guessing.

5.3.3 Suggestions for Constructing Better Multiple-choice Questions


The following are a few of the guidelines to be followed for preparing better multiple-choice
items:
1. Present a definite, explicit and singular question or problem in the stem. Examinees
should not depend on the alternatives to know what is asked or the item should not be a
collection of unrelated true - false statements dealing with the same general topic.
Example: Psychology:
A. Is the systematic study of human behaviour
B. Was founded by Socrates
C. Uses drug to help maladjusted individuals
D. An old discipline aging 1000 years.
In the above example whether what is asked is about the definition of psychology or the
founder of it or scope it is not known without looking at the options. Hence, it should be
improved as:

87
What is Psychology? It is the systematic study of ________________.
A. Human behaviour
B. Soul and spirits
C. Drugs for curing mental problems
D. Social problems
2. Stem should be written either in direct question form or in an incomplete statement
form. When possible, state the stem as a direct question rather than as an incomplete
statement. This helps the item to be communicated in the most clear and natural way.
For instance:
Poor: A hormone that is labelled as ‘fight or flight’ is called_______________.
Better: What is the name of a hormone labelled as ‘fight or flight’?
3. Eliminate excessive verbiage or irrelevant information from the stem. This reduces
the chance of complex sentences from acting as barrier to clear communication of the
problem.
Example:
The man who was fired twice from college but through working tirelessly wrote a
book that is currently regarded as the bible of natural science was a British citizen.
His works had influenced thinking and practices in Politics, Literature, Psychology
and Philosophy. This man had developed a theory of evolution which its basic tenet is
survival of the fittest. His name was:
A. Darwin
B. Newton
C. Lamark
D. J. Adams
What the teacher wanted to ask was ‘’who developed a theory of evolution which its basic
tenet was survival of the fittest?’ But he uneconomically included excessive verbiage that
made the item to be unclear.
4. The stem of the item should present only one problem. Two concepts must not be
combined together to form a single stem.
5. Include as much of the item in the stem and keep options as short as possible: this
leads to economy of space, economy of reading time and clear statement of the problem.
Hence, include most of the information in the stem and avoid repeating it in the options.
For example, if an item relates to the association of a term with its definition, it would be
better to include definition in the stem and several terms as options rather than to present
88
option in the stem and several definitions as alternatives. Include in the stem any word(s)
that might otherwise be repeated in each alternative.
Example:
Poor: According to Ethiopian constitution, the prime minister is elected:
A. By members of the parliament
B. By members of the winning party
C. By members of the house of federation
D. By members of the council of ministers
This question can be rephrased as:
Better: According to Ethiopian constitution, the prime minister is elected by members of the:
A. Parliament
B. Winning party
C. House of federation
D. Council of ministers
6. Use negatively stated stems sparingly. When used, underline and/or capitalize the
negative word. This is to prevent students from reading negatively stated statements
positively. There are times when it is important for the examinee to detect errors or to
know exceptions. For these purposes, sometimes the use of ‘not ‘or ‘except’ is justified
in the stem.
7. Make all alternatives homogenous, plausible and attractive to the less knowledgeable or
skilful students. If this is not done, a less prepared student can easily get the correct
answer through cancelling the incorrect options as they are poorly related to the
question/correct answer. If an examinee who does not know the correct answer is not
distracted by a given alternative, that alternative is not plausible and it will add nothing to
the functioning of the item.
Example:
Poor: Which of the following politician is considered as father of South Africa?
A. Haile G/Selassie
B. Barak Obama
C. Nelson Mandella
D. Micheal Jackson
This item could be improved by eliminating the options given under letter A, B & D and
substituting with names of individuals who are famous in South African politics like Oliver
Thambo, Oliver Regan, and Jacob Zuma etc.
89
The following are some suggestions given to help one in creating good distracters.
A. Base distractors on the most frequent errors made by students in homework
assignments or class discussions related to that concept.
B. Use words in the distractors that are associated with words in the stem (for example,
explorer-exploration).
C. Use concepts from the instructional material that have similar vocabulary or were
used in the same context as the correct answer.
D. Use distractors that are similar in content or form to the correct answer (for example,
if the correct answer is the name of a place, have all distractors be placed instead of
using names of people and other facts).
E. Make the distractors similar to the correct answer in terms of complexity, sentence
structure, and length.
8. Make the alternatives mutually exclusive.
Example:
Poor: The daily minimum required amount of milk that a 10 year old child should drink is
A. 1-2 glasses.
B. 2-3 glasses.
C. 3-4 glasses.
D. at least 4 glasses.
This item could be improved as:
Better: What is the daily minimum required amount of milk a 10-year-old child should drink?
A. 1 glass.
B. 2 glasses.
C. 3 glasses. "
D. 4 glasses
9. Make alternatives approximately equal in length. In order to make one statement
unarguably correct, teachers make the answer statement long through specifying it using
modifiers and qualifiers. This gives clue to test wise students to guess.
10. Alternatives should be consistent with the grammatical and syntactical construction of the
stem. Grammatical inconsistency provides irrelevant clues.
11. Avoid options that have verbal association with the correct answer option. Similar to
matching exercises verbal associations can give clue to the correct answer.

90
Example: The psychological process by which human brain processes the sensory data
collected by sense organs is _________________.
A. Sensation C. Emotion
B. Perception D. Learning
In the above example, test wise students can easily know the correct answer by
identifying verbal association between sensory data (from the stem) and sensation (from
the alternatives).
12. There should be clearly only one correct or best answer to every item. In addition to
this, alternatives like’’ all of the above’’, ‘’none of the above’’ and A and B should be
used sparingly in correct answer questions as they don’t serve as effective distracters like
that of the separate choices. Furthermore, they must not be used in best answer questions
as students are expected to select the option which is higher in degree of correctness as
compared to others.

Dear learner, writing test items that have a best-answer response alternative is
usually more difficult than writing those that offer a single absolutely correct response
alternative and clearly incorrect distractors. For best-answer test items, an evaluative
judgment must be made by the writer to determine the worth of each response alternative
in relation to the information given in the item’s stem. More than one response
alternative will contain information that is relevant or correct; however, one of these
should be more complete or more relevant to the specific information given in the item’s
stem.
13. Correct answer should follow random pattern. The correct answer should also appear
at each position in almost equal numbers. While constructing multiple - choice item,
some examiners have a tendency to place correct alternative at the first position. Some
place it in the middle and others at the end. Such tendencies should be consciously
controlled. The correct ans. should not be comparatively long or short than the
distractors.
14. Like Matching items if there is a logical sequence in which the alternatives can be
arranged (alphabetical if a single word, in order of magnitude if numerals, in temporal
sequence, or by length of response), use that sequence.
15. Multiple-choice items should be independent. That is, an answer to one question
should not depend on the answer to another question.

91
SELF TEST CHECKLIST

At last, the following review will help you the quality of your multiple choice items. Read
each of the following questions and put a tick mark in the blanks under “Yes” or “No”
column.

Yes No
Is this the most appropriate type of item to use?  
Does the stem of the item present a definite problem?  
Is the item stem free of irrelevant material?  
Are negative statements used only under necessary conditions?  
Are all the alternatives grammatically consistent with the stem of the item?  
Does each item have only one correct or clearly best answer?  
Are all distracters plausible?  
Are verbal associations between the stem and the correct answer avoided?  
Are correct and incorrect alternatives equal in length?  
Do correct answers distribute in all the alternatives approximately equal
number of times and randomly?  
Are “none of the above” and “all of the above” used only sparingly?  

5.4 Context Dependent / Interpretive/ Item Format

INTRODUCTION

Dear learner, so far you have been dealing with the most common objective type items.
Nevertheless, those are not the only ones that we ever use. Teachers can also use other forms
of objective type items. They are interpretive exercises. Have you ever heard of them?
Probably Yes. They are frequently used in Mathematics, Geography, Language and Science
subjects.

Dear learner, what is context dependent item? Indicate its basic difference from other
objective items formats.
5.4.1 Description of Interpretative Item Format
An interpretive exercise consists of objective test items that depend on a common data. The
data could be a reading passage, a map, a picture, figure, chart, table etc. Because items in
interpretive exercises depend on external source, they are called context dependent items.

92
Interpretive exercises can measure a broad range of outcomes ranging from simple
knowledge to application level.

Interpretive exercise makes it possible to control the amount of factual information given to
students. We can give them as much or as little information as we think desirable in
measuring their achievement of a learning outcome. In measuring their ability to interpret
mathematical data, for example, we can include the formulas needed or require the students
to supply them. In other areas, we can supply definitions of terms, meanings of symbols, and
other facts or expect students to supply them. This flexibility makes it possible to measure
various degrees of proficiency in any particular area.

The selection of context or stimuli is made in accordance with the nature of the
discipline/subject and the learning outcome to be measured. They are labeled as context
dependent or interpretive as the questions in this item format are answered after interpreting
the information given in the form of the following contexts:
- Paragraphs
- Graphs, Tables and charts like histograms, bar graphs, pie charts etc.
- Maps, Diagram, Picture and photographs
They are a bridge between constructed response and selected response item formats as they
require students to either supply or select the correct answers. It is used to measure learning
achievement of higher level such as comprehension, interpretation, extrapolation, application,
reasoning, analysis etc. More specifically, they are dominantly used to assess students’ ability
of formulating and testing hypothesis, ability to draw warranted conclusions from data,
ability to recognize inferences in one’s own thinking and in the thinking of others, and ability
to apply principles or generalizations to solve problems.

FORMS AND USES OF THE INTERPRETIVE EXERCISES


There are so many forms and uses of the interpretive exercise that it is impossible to illustrate
all of them. In the subsequent paragraphs, we shall see examples of this item type.
For what Purpose do we use Interpretive Exercises?
Interpretive exercises are useful since they lend themselves to, and place more emphasis on,
the measurement of understanding, interpretation, and evaluation.

93
Paragraph as a Context
In English tests, teachers use paragraph as a context to measure learning outcomes relating to
reading comprehension i.e. understanding meaning/theme of the paragraph, understanding
contextual meanings of words, relating and synthesizing various parts of information given in
a paragraph etc.

Maps, Diagram, Picture as a Context


The questions using diagram may measure not only knowledge but understanding and
application as well. It may also require the examinee to label various specified parts of the
diagram, or even ask about their functions. On the other hand, reading and interpreting graphs
is the ability that can be useful in most social and physical sciences.

In Geography exams, teachers give the average monthly rainfall amount of a given city for 12
months and ask examinees to calculate the annual average and range of the city’s rainfall
amount based on the data.
In Economics exams, teacher give a graph that holds demand and supply elements and ask
them to interpret the relationship between the two factors and predict about the rate of price
shift based on the graph.

The questions are used using other objective formats like True/False, Short answer,
Matching, Multiple Choice etc. A variety of multiple-choice items may be used to measure
learning achievement of higher level such as comprehension, interpretation, extrapolation,
application, reasoning, analysis etc. and help the students focus more on the items/test.

5.4.2 Advantages and Limitations of Interpretive Exercise


The interpretive exercise has several advantages as well as limitations. In the following
section we will see some of the advantages and disadvantages.
Advantages
• They help to assess more complex learning outcomes than any single objective item.
• As a related series of items can be developed, interpretive exercises can tap greater skills
in depth and breadth. Related to this, it is economical as teachers use the same material
for developing many questions.
• It is a more direct measure of improvement in writing skills in language courses.
• The structuring of the problem assists both examiner and examinee. Both approach the
problem with the same frame of reference, which should help reduce ambiguity.

94
• Complex material can be measured with a series of different items based upon a single
introductory passage, graph, chart, or diagram.
• In contrast to the essay, complex achievement is measured in a more structured situation,
but objective scoring is employed.
• The introductory material makes interpretive exercises capable of measuring the ability to
interpret written materials, charts, graphs, maps, pictures, and other communication
media encountered in everyday situations.
• It measures more complex learning outcomes than can be measured with single objective
test item. Based on a common set of data, greater depth and breadth can also be obtained
in the measurement of intellectual skills.
• The interpretive exercise minimizes the influence of irrelevant factual information on the
measurement of complex learning outcomes. Students may be unable to demonstrate their
understanding of a principle merely because they do not know some of the facts
concerning the situation to which they are to be applied. That is, in other objective test
items students may fail to show their understanding due to lack of detailed factual
information not directly pertinent to the purpose of the test. This is so because in the
introductory materials, we can give students the common background of information
needed to demonstrate understanding, thinking skills, and problem-solving abilities.

Limitations
1. It is laborious and time intensive as well as difficult to construct unlike other item
formats.
2. Difficult to find appropriate introductory material and they may lack clarity, precision,
and conciseness is often required.
3. Examinees that lack sufficient reading skills may perform poorly particularly when the
context used is paragraph.
4. In comparison with the essay questions, the interpretive exercise has two limitations as a
measure of complex learning (Gronlund, 1985):
a. First, the results indicate to the teacher only that the student is or is not able to
function at higher levels not whether the pupil has the ability to integrate these
skills in a different situation.
b. Second, the interpretive exercise indicates only whether the pupil can recognize
the answer-not whether he can supply evidence to demonstrate his problem-
solving skills and his organizational ability. "To measure the ability to define

95
problems, to formulate hypotheses, to organize data, and to draw conclusions,
supply procedures such as the essay test must be used" (Gronlund, 1985, P: 206).

5.4.3 Suggestions for Constructing Better Interpretive Items


Making good interpretive exercises requires having good introductory materials and list of
items that depend on them. Therefore, the following suggestions relate to these two points.
1. Select introductory material that is relevant to the objectives of the course.
According to Linn and Gronlund (2000), great care must be taken in the selection and use of
the introductory material. The success in the purposes of interpretive exercises depends
largely on the introductory material, because this provides the common basis for the test
items. If the introductory material is too simple, the exercise may become a measure of
general information or simple reading skill. On the other hand, if the material is too complex
or unrelated to instructional goals, it may become a measure of general reasoning ability.
Both extremes must be avoided.
2. Select introductory material that is appropriate to the students' curricular
experience and reading level.
When you write introductory materials, you have to put it in a form that is familiar to
students. For example, if your students have never seen a chart you should avoid the use of
charts as your introductory materials.

Linn & Gronlund (2000) also suggested the use of introductory material that is less
demanding in a condition in which various types of introductory materials will serve a
purpose equally well and all are familiar to the students. There is a general suggestion that
pictorial materials be used with elementary school students and a combination of pictorial
and verbal materials with higher grade students.

In addition, the language level or demand of the introductory material should be carefully
seen so that it will not impede students from demonstrating their interpreting skills.
3. Select introductory material that is new to students.
For interpretive exercises to measure complex learning outcomes, the introductory material
should normally be new. However, it should not be too much. Linn & Gronlund (2000) stated
that the use of materials that are similar to those used in class but vary slightly in content and
form. These materials can be obtained by modifying texts from textbooks, newspapers, news
magazines, statistical reports, article abstracts, etc.

96
4. Select introductory material that is brief but meaningful
In order to reduce the reading skill demand introductory materials should be brief (short).
However, care must be taken not to strive for making the material short by at the expense of
relevant elements. This is so because when one strives to make introductory materials brief,
he/she may omit some elements that may be crucial from the viewpoint of learning outcomes
being measured. In addition, it may happen that when one strives for shortness he/she may
end up with incomplete and uninteresting introductory material.
5. Revise introductory material for clarity, conciseness, and greater interpretive value
Often introductory materials before revision do not fit our purpose. Therefore, we need to
make some modifications on them. For example, we may need to delete some elements from
the material so that we can use them for our questions.
6. Construct test items that require analysis and interpretation of introductory
material
According to Linn and Gronlund (2000) two errors in interpretive test items are common.
These are
a. items that can be answered directly in the introductory material, and
b. those that can be answered without reading the introductory material.
Interpretive exercises should be reserved for higher order objectives. They could require
students to provide knowledge beyond that is presented in the exercises or interpretations
limited to the factual information provided. In both cases, test items should be dependent on
the introductory material.
7. Make the number of test items roughly proportional to the length of the
introductory material
This is to say you have to get most out of interpretive exercises. It will be wastage if you
construct two or three test items for lengthy introductory material. Though it is impossible to
specify the number of items to be constructed from a given introductory material, there is a
general understanding that interpretive exercise should ideally have brief introductory
materials and a relatively large number of test items.
8. In constructing test items for an interpretive exercise, consider the suggestions given
for constructing objective test items.
Because the items in interpretive exercises are objective test items, they have to satisfy
requirements of objective test items.

97
SELF TEST CHECKLISTS
It is time to check your understanding of writing interpretive test items. Read each of the
following questions and answer them by checking in one of the boxes under alternatives
'Yes' or ‘No’.
Yes No
Can you list different types of interpretive test items?
Can you state the uses of interpretive test items?
Can you state advantages and limitations of interpretive test
items?
Can you list the guidelines in constructing interpretive test
item types?
Can you evaluate the qualities of different interpretive type
items?
Is there any box that you marked under 'No'? If there is any, please go back to your text and
read about it before you go to the next unit.

UNIT SUMMARY
In this unit, we have learned about the nature of selected response test types, their uses,
advantages and limitations, and suggestions given by experts to construct quality items.
• In schools it has been a dominant tradition of using selected response tests for assessing
students’ learning. They expect students to select the correct/best answer after giving
them lists of options rather than expecting them to produce their own answers like
constructed response item formats. This form of test includes item formats like True/False
(Alternate response), Matching, Multiple choice and Context dependent (Interpretive)
item formats.
• In True-False item format, students are expected to judge whether a certain declarative
statement is True/False, Correct or Incorrect, Right or Wrong, Valid or Invalid etc.
Though easiness of its construction is an advantage but it is an item format that is most
susceptible to guessing which making any blind guess results 50% chance of being
correct.
• Matching exercises have two sides the premise (question side) and response (the options
or alternatives side). Serving of one alternative for all questions is a big advantage as

98
producing plausible alternatives is a difficult task for teachers. However, dependency on
homogenous material makes it limited to measuring only a lesser portion of the course.
• Multiple choice item formats is the most versatile of all selected response item formats as
it can measure all instructional outcomes found at different levels in the cognitive domain
with the exception of synthesis level. However, searching 4 plausible alternatives for each
and every question becomes a limitation in producing items. Multiple choice items can be
presented in best answer or correct answer form. In the earlier case options/alternatives
given are all correct but the student must select the better or best one. In the later case, all
the distracters or foils are incorrect and thus students select the correct one from a set of
incorrect options.
• While the other types of selection response items in most instances are confined to
measuring simple learning outcomes, multiple choice items can be adapted to measure a
broad range of outcomes ranging from simple to complex levels.
• The name interpretive (context) dependent item format itself implies that students are
expected to interpret data or answer questions after understanding the context in which
the question is presented. Hence, teachers present questions using the following contexts
like paragraph, graphs, tables and charts like histograms, bar graphs, pie charts, maps,
diagram, etc and pictures and photographs. The questions in this item formats are
presented using the above selected response item formats even using short answer and
essay. Even though using such introductory materials makes the questions interesting but
poor reading ability of students may interfere with the direct measurement of the subject
matter. Interpretive exercises are types of tests that are suitable to measuring achievement
of complex learning outcomes by providing common set of data to students.
• All selection response test items have their own weaknesses as well as strengths. There
are also suggestions or guidelines that could improve the quality of selection response test
items.

99
SELF TEST CHECKLISTS
It is time to check your understanding of writing objective test items. Read each of the
following questions and answer them by checking in one of the boxes under alternatives
'Yes' or 'No’.
Yes No
Can you list different types of objective items?
Can you state the uses of different objective test items?
Can you state advantages and limitations of different
objective test items?
Can you list the guidelines in constructing different
objective item types?
Can you evaluate the qualities of different objective type
items?
Is there any box that you marked under 'No'? If there is any, please go back to your text and
read about it before you go to the next unit.

SELF TEST EXCERCISES


Direction I. The Following are True/False items. Identify their Problems.
1. All broad-leafed trees lose their leaves in winter.
2. Usually, states have a senate and a house of representatives.
3. The Declaration of Independence was probably written with a steel-pointed pen.
4. The sears tower in Chicago is a very tall building.
5. If the speaker of the House of Representatives is not present this is not a reason to deny a
motion to adjourn the meeting of congressmen.

Answers to Direction I
1. It used a specific determiner ‘All’ that makes the answer to be False.
2. It used a specific determiner usually which gives a clue that the answer is ‘True’.
3. It is a trivial item as it is asking about the nature of the pen than the declarations content.
4. It used a vague term ‘’Very tall’’ which is difficult to measure.
5. It used double negatives and is a bit long that may give a clue that the answer is ‘True’.

100
Direction II: The Following are matching exercises. Identify their limitations on the
bases of the guidelines for constructing better matching exercises.
Matching Test I
Column A. Column B
th
1. Washington A. 16 president
2. Concord B. 1st president
3. Jefferson C. Site of revolutionary battle
4. J.Q.Adams D. Wrote the declaration of independence
5. A. Lincoln E. His father was also president

Matching Test II.


Column ‘A’ Column ‘B’
1. Behaviorism A. Focuses on identifying structures of conscious experience
2. Psycho-analytic B. The whole is more than the sum of its parts
3. Gestalt C. Founded by Sigmund Freud
4. Structuralism D. Founded by J.B. Watson
5. Functionalism E. Focuses on functions of mind

Answers to Matching Self-Test Exercises


Matching Test I.
In the first matching exercise, directions about how to select the answers were not given to
students. Moreover, for 5 questions only 5 options were given. In addition to this, the longer
sentences should have been under column A. But above all this, the items are not measuring a
homogenous idea.
Matching Test II
Similar to the first matching exercise in the second matching exercise, directions were not
given, the number of questions and options are equal, the longer items are found under
column B and the items are not homogenous (as some require the authors and others require
the definitions of the perspectives). But unique to the above cases, in this exercise there is
verbal association between the premises and responses which gives a clue to the correct
answer (For instance, functions of the mind and functionalism).

101
Multiple Choice Test
Direction: Identify the weaknesses of the following multiple-choice items.
1. To be an insect a creature must have:
A. Not more than 4 legs
B. A body limited to head and abdomen
C. No wings
D. A body that includes three distinct parts: head , thorax and a functional abdomen
2. The one that is vegetable is:
A. Maize
B. Apple
C. Orange
D. Cabbage
3. Trees
A. Are classified as hard wood and pith wood
B. Are gymnosperms or angiosperms
C. Create carbon monoxide
D. Grow only in temperate zones
4. The health of an economy is hard to discern, but one indicator of the health of an
economy that is often used by economists and many writers of financial columns in the
newspaper and by stock market analysts is the rate of employment in a state or locality.
The rate of employment is:
A. Calculated by the number of jobs divided by the number of persons who do not
have a job
B. Determined by the demand for labor at the available wage
C. Based on the number of members in all unions divided by those who are not
unionized.
5. Which is not true about multiple-choice test? They
A. Are applicable to all subject-matter areas in schools
B. Tap a large number of points of information in a unit
C. Eliminate guessing among the sets of options
D. Can be written only at the foundational content level
6. Which of the following statement describes a problem with multiple-choice items?
A. Not all distracters are equally plausible

102
B. The demand of reading is usually too low
C. There are too many higher order items
7. In Illinois the temperature all week has reached 85 degrees Fahrenheit. It is likely that
people in:
A. Argentina is warm and the fields are green
B. South Africa are wearing coats and heavy sweaters
C. There is 11 inches of snow this morning in Quebec
8. The largest expenditure by the federal government goes for:
A. Maintaining the military establishment
B. Services to citizens (e.g social security)
C. Gifts to the undeveloped foreign nations
D. All of the above

Answers for Multiple Choice Items


1. Alternative ‘D’ which is the correct answer gives clue to students for being longer in its
sentence length than the rest.
2. Alternative ‘A’ is not plausible alternative because it’s member of cereal rather than
fruits. Using fruit names is advisable as students usually confuse with vegetables.
3. The stem is not clear by itself. To answer the question, examinees should depend on
skimming the options.
4. There is excessive verbiage in the stem that diverts the item from measuring economics
knowledge to measurement of reading skill. Moreover, the alternatives are limited to
three that maximizes guessing effect from 25% to 33%.
5. There are specific determiners that indicate absolute degree of occurrence in options ‘A’
(like the term All), ‘C’ (like the term eliminate) and on ‘D’ (like the term only).This gives
clue to test wise students and hence makes them to be poor distracters.
6. The options ‘B’ & ‘C’ are not plausible distracters as they are discussing about strength
(benefit) of an item format than the problem asked. Moreover, the negative indicator term
‘Not’ should have been underlined/written in bold. Lastly, the options given like
questions number 4 are only 3. So, it shares the same problem to item number ‘4’.
7. The options are not logically related to the stem.
8. Since the question is a multiple choice one as it used the superlative comparison term ‘the
largest’ it should not have included the option ‘All of the above’.

103
CHAPTER SIX
CONSTRUCTED RESPONSE ITEM FORMAT
Dear learner, in unit five, you studied about selected response test items. So, we hope you
have identified some weaknesses of them. Constructed response items can minimize some of
the weaknesses you saw in the selected response test types. In this unit, you will examine
different types of constructed response tests items. As indicated in the previous section,
constructed response item formats are so called as they require students to develop their own
answers than expect them to select from the lists of alternatives given.

Unit Objectives
After completion of this unit, you will be able to:
• Indicate the difference between short answer and completion questions.
• Explain the advantages and limitations of short answer/completion item types.
• Criticize short answer/completion items and essay items developed by others.
• Understand the main advantages and limitations of essay questions and common
misconceptions associated with their use.
• Distinguish between learning outcomes that are appropriately assessed by using essay
questions and outcomes that are likely to be better assessed by other means.
• Construct valid and reliable essay items using the guide lines that assess given
objectives.
• Evaluate the qualities of essay tests items using the guidelines prepared by others.
• Elaborate the difference between analytical and point scoring methods of essay items.

Dear learner, can you explain why constructed response items are also called alternative
assessment item format? Why do you think constructed response items emerge?

Why has alternative response item format emerged as an assessment form?


Interest in the use of alternative types of assessment has grown rapidly both as a response to
dissatisfaction with multiple-choice and other selected-response tests and as an element in a
systemic strategy to improve student outcomes. Alternative assessments range from written
essays to hands - on performance tasks to cumulative portfolios of diverse work products.

104
Item formats included in this assessment form include oral presentations, essays, oral exams,
exhibitions, demonstrations, performances, products, research papers, poster presentations,
capstone experiences, practical exams, supervised internships and practicum.

According to National Center for Research in Vocational Education (NCRVE) cited in Rahn
et al., (1995) Alternative/Constructed-response assessment types are categorized into four
forms:
1. Written assignments
2. Performance tasks
3. Portfolios
4. Senior Projects (research papers, projects, oral presentations)

Written assessments are activities in which the student composes a response to a prompt. In
most cases, the prompt consists of printed materials (a brief question, a collection of
historical documents, graphic or tabular material, or a combination of these). However, it
may also be an object, an event, or an experience. Student responses are usually produced “on
demand,” i.e., the respondent does the writing at a specified time and within a fixed amount
of time.

There are two types of written assessments. The first consists of open-ended questions
requiring students to answer short-answer and restricted essay questions. The required answer
might be a word or phrase (such as the name of a particular piece of equipment), a sentence
or two (such as a description of the steps in a specific procedure), or a longer written response
(such as an explanation of how to apply particular knowledge or skills to a situation). They
make very limited cognitive demands, asking students to produce specific knowledge or
facts.

The second type of constructed-response written assessment includes extended essays,


problem-based examinations, and scenarios. These items are like open-ended questions,
except that they typically extend the demands made on students to include more complex
situations, more difficult reasoning, and higher levels of understanding.

A detailed discussion of short-answer and essay assessments (both extended and restricted) is
given below.

105
6.1 Short-Answer Item Formats
These are suitable for measuring a wide variety of relatively simple learning outcomes. They
can be used for measuring knowledge of terminology, knowledge of specific facts,
knowledge of principles and knowledge of procedure or method and Simple interpretations of
data.

Is there any difference between short answer items: direct question and completion type
items? Where does their difference lie? Did you try to give the answers for these questions? If
so, that is great. There is a slight difference between the two forms.

6.1.1 Descriptions of Short-Answer Item Formats


Short answer items are two types: direct question and completion questions. They are
identical in that both require students to supply their answers in symbols, words, numbers or
phrases rather than select the correct from a list of alternatives. However, they are different in
the form of sentence they are stated. Short answer questions are stated in interrogative
sentence form (usually using WH question format like what, when, where etc.) whereas
completion items are stated in incomplete statement form.
The following example will clarify these differences clearly,
What is the capital city of Ethiopia? (Short answer question)
The capital city of Ethiopia is called______________. (Completion question)
Uses of Short Answer (Supply) Items
• Short-answer item is especially useful for measuring problem-solving ability
in science and mathematics.
• Complex interpretations can be made when the short- answer item is used to
measure the ability to interpret diagrams, charts, graphs, and pictorial data.
Note: When short-answer items are used the question must be stated clearly and concisely. It
should be free from irrelevant clues, and require an answer that is both brief and definite.

6.1.2 Advantages and Limitation of Short Answer Items


Advantages of Short Answer Items;
• Short-answer test item is one of the easiest to construct. This occurs as teachers are not
expected to prepare plausible alternatives or prepare items from homogenous material.
• In short-answer type item the student must supply the answer; this reduces the possibility
that the examinee will obtain the correct answer by guessing.

106
• Useful in assessing mastery of factual information when a specific word or phrase is
important to know.

Limitations of Short-Answer Item format


They have major limitations;
A. It is not suitable for measuring complex learning outcomes. With the exception of Maths
and Science, it is used for measuring instructional objectives stated dominantly at
knowledge level.
B. Scoring is inefficient/time consuming. Rather than scoring letters that represent students’
answers, teachers are expected to read fully students answer stated in phrase or word.
Misspellings and lack of neatness complicate the problem more.
C. Unless the question is carefully phrased, many answers of varying degrees of correctness
must be considered for the total or partial credit. Hence it is difficult to score.

6.1.3 Suggestion for Constructing Short-Answer Items


1. Word the item so that the required answer is both brief and specific.
For instance, look at the following example:
Addis Ababa is found in_______________________.
This question can be answered in multiple ways like Ethiopia, The Earlier Shewa
Province, Oromia National Regional State, East Africa and Africa and hence can’t be
scored efficiently. Hence, to prevent the occurrence of multiple responses that creates
scoring inefficiency it is better to phrase it specifically.
2. A direct question is generally more desirable than an incomplete statement. It keeps the
questions to be presented in natural way.
3. Do not take statements directly from textbooks to use as a basis for short-answer items.
This is because the context may have been not given.
4. If the answer is to be expressed in numerical units, indicate the unit of measurement
through which responses to be expressed otherwise multiple answers may occur.
Example: 10 Kms is equal to__________
Are students supposed to answer interms of Miles, Meters, Millimeters or what?
5. Blanks for answers should be equal in length. The shortness and longness of the blanks
may give a clue to students that the question is to be answered either in phrase or word.
6. Do not include too many blanks. This makes the questions to be vague and encourage rote
learning.

107
Example: ______________, _________________, ___________ and ________ are ____
7. Avoid using grammatical clues to the correct response. The use of articles like ‘a’/‘an’
may give a clue to the correct answer.
Example: A subatomic particle with a negative electric charge is called an ____________.
The student could easily eliminate from her mind the options proton, neutron, and meson
as possible responses as they don’t with a vowel. Hence, she can easily answer saying
electron.
The improved one will be: A subatomic particle with a negative electric charge is called a
(an) ___________.
The rephrased item is better as it will lessen the chance of guessing by giving article
options that start with both vowel and consonant.
8. If possible, put the blank space at the end of a statement rather than at the beginning.
Asking for a response before the student understands the intent of the statement can be
confusing and may require more reading time.

SELF TEST CHECKLISTS


See the following checklist for constructing short answer and completion items. Read each of
the following questions and put a tick mark in the blanks under “Yes” or “No” Column.
‘Yes’ ‘No’
Is this the most appropriate type of item to use?  
Have you avoided taking statements directly from textbooks?  
Has the degree of precision been specified for numerical  
answers?
Are blanks for answers equal in length?  
Are too many blanks in an item avoided?  
Are blank spaces put the blank space at the end of the item?  
Are irrelevant clues avoided?  
Are the words omitted only the important ones?  
If you have ticked NO options, the items are poor and need revision.

108
6.2 Essay Item Format
According to Linn & Gronlund (2000) essay tests allow for freedom of response. Students
are free to select, relate, and present ideas in their own words. But the freedom is a matter of
degree. In some instances that freedom is delimited to specific size. In other cases, no
restriction is put. So, based on the extent of freedom essay tests can be classified into
restricted essay tests and extended response essay tests.

6.2.1 Description of the Essay Item Format


An essay question is "… a test item which requires a response composed by the examinee,
usually in the form of one or more sentences, of a nature that no single response or pattern of
responses can be listed as correct, and the accuracy and quality of which can be judged
subjectively only by one skilled or informed in the subject."

What is an Essay Question?


The distinctive feature of essay question is the freedom of response. Pupils are free to select,
relate and present ideas in their own words. In this case, the students have the responsibility
of thinking out the answers to the questions asked. They have the freedom to express or state
the answers in their own words.

An essay question is a test item which contains the following four elements:
- Requires examinees to compose rather than select their response.
- Elicits student responses that consist of one or more sentences.
- No single response or single response pattern is correct.
- The accuracy and quality of students’ responses to essays must be judged subjectively
by a competent specialist in the subject.

Essay tests are of two variations: -


Based on the limit it put on students’ response to essay items, they are classified into
extended and restricted.
A. Restricted Essay - in this form of essay students’ response is either limited in terms of
form or content. Limitation in terms of content refers to the limit put on the size of
students’ response like in how many pages, paragraphs, or sentences students present
their answers. Whereas limitation interns of content refer to the limit presented in terms
of the subject matter elements like how many causative factors, how many consequences

109
of erosion etc that they must list down. Looking at the following examples makes the
differences visible:
Explain the consequences of soil erosion on Ethiopian economy? Present your answers in
not more than a page (limitation interms of form i.e., in not more than a page).
Explain four major consequences of soil erosion on Ethiopian economy. (Limitation in
terms of content i.e., four major).
B. Extended Essay - contrary to restricted essay, in this form of essay there are limitations
in neither the content nor form. The above examples become extended essays if written in
the following manner:
Explain consequences of soil erosion on Ethiopian economy.
Write your own evaluation of the value of the New Pre-service Teacher Education
System Overhaul (TESO) in preparation of qualified or well-trained secondary school
teachers.

What are the major distinctions between selected response items and constructed
response items? What are the weaknesses of selected response type items? How do essay
items overcome the weaknesses of selected response type tests?

The following are some of the strengths and weaknesses of essay items.
When should essay questions be used?
Essay should be used to assess higher-order thinking skills that cannot be adequately assessed
by objectively scored test items. In other words, it means it requires the student to organize,
integrate, and synthesize knowledge; to use information to solve novel problems; or to be
original and innovative in problem solving. There are also secondary factors needed to be
considered if essay questions are to be used:
• When a teacher has sufficient resources and/or help (time, teaching assistants) to
score the student responses to the essay question(s).
• When the teacher has shortage of time for preparing larger number of objective items.
• When the group to be tested is small.
• When a teacher is more confident of his/her ability as a critical and fair reader than as
an imaginative writer of good objective test items.

What advantage in general do you think constructed response items have over selected
response ones? What limitation do you think in general constructed response items have as
compared to selected ones?

110
6.2.2 Advantages and Limitations of Essay Items
In order to use essay questions effectively, it is important to understand the following
advantages, limitations and common misconceptions of essay questions.

What are the major advantages and limitations of essay questions?


Strengths of Essay Items
1. Essay items provide an effective way of assessing complex learning outcomes that
cannot be assessed by other commonly used paper-and-pencil assessment procedures.
Essay questions allow you to assess students' ability to synthesize ideas, to organize, and
express ideas and to evaluate the worth of ideas. They are unique in measuring students’
ability to select, organize and integrate concepts and present them in logical prose.
2. Essay questions allow students to demonstrate their reasoning. Essay questions not
only allow students to present an answer to a question but also to explain how they
arrived at their conclusion. This allows teachers to gain insights into a student's way of
viewing and solving problems. With such insights teachers are able to detect problems
students may have with their reasoning process and help them overcome those problems.
3. They are easy to construct. Essay questions are easier to construct than multiple-choice
items format that demands producing four or more options for large number of questions
because teachers don't have to create effective distracters. However, that doesn’t mean
that good essay questions are easy to construct. They may be easier to construct in a
relative sense, but they still require a lot of effort and time. Essay questions that are
hastily constructed without much thought and review usually function poorly.
4. They have a good effect on students’ learning. Some research seems to indicate that
students are more thorough in their preparation for essay questions that in their
preparation for objective examinations like multiple choice questions. In other words,
asking questions from analysis, evaluation and synthesis level make students to be
creative, critical, and analytical readers.
5. They present a more realistic task to the student. In real life, questions will not be
presented in a multiple-choice format, but will require students to organize and
communicate their thoughts.
6. Because students are expected to supply than select answers it reduces guessing. This
problem does not exist with essay questions because students need to generate the answer
rather than identifying it from a set of options provided. At the same time, the use of
essay questions introduces bluffing, another form of guessing. Some students are adept at

111
using various methods of bluffing (vague generalities, padding, namedropping, etc.) to
add credibility to an otherwise vacuous answer. Thus, the use of essay questions changes
the nature of the guessing that occurs, but does not eliminate it.

Limitations of Essay Items


1. Essay items sample less percentage of the content covered in the class. This happens as
scoring students answer takes a long time.
2. They are difficult to score objectively and reliably. Research shows that a number of
factors can bias the scoring process like:
- Different scores may be assigned by different readers or by the same reader at different
times for a single students’ response. This is because different scorers may give
different value for students’ answer components like legibility, logical coherence and
flow of ideas, language proficiency, word economy etc. Furthermore, mood upswings
within a scorer can result even different scores for same student’s answer sheet within
24hrs.
- A context effect may operate; an essay answer sheet of a second student preceded by a
top-quality essay paper of first student receives lower marks than when preceded by a
poor-quality essay.
- Papers that have strong answers to items appearing early in the test and weaker
answers later will fare better than papers with the weaker answers appearing first. This
is called carryover effect
- Scores are influenced by the expectations that the scorer has for the student’s
performance. If the reader has high expectations, a higher score is assigned
than if the reader has low expectations. If we have a good impression of the student,
we tend to give him/her the benefit of the doubt. This is called Halo effect.
3. Encourages bluffing-when students don’t know the correct answer, they prefer to fill
them with anything that they feel may answer the question.

6.2.3 Suggestions for Constructing Essay Test Items


Linn & Gronlund (2000) said that use of essay questions as a measure of complex learning
outcomes necessitates giving attention to two important points;
1. How to construct essay tests, and
2. How to score them reliability.

112
Accordingly, they suggest the following when constructing essay tests.
Students should have a clear idea of what they are expected to do after they have read the
problem presented in an essay item. Below are specific guidelines that can help to improve
writing essay items.
1. Construct questions that are very clear and specific. Failure to establish adequate and
effective limits for the student response to the essay question allows students to set their
own boundaries for their response, meaning that students might provide responses that are
outside of the intended task or that only address a part of the intended task. Therefore, it
is the responsibility of the teacher to write essay questions in such a way that they provide
students with clear boundaries for their response. Moreover, it helps to score questions
more objectively:

Example: Why do individuals get addicted to face book?


Better: Why do individual get addicted to face book according to Behavioral perspective?
In the above example, scoring the second question is very easy as the student provides
his/her answers in line with behavioral perspective. However, in the first question diverse
responses may be provided by students, taking alternative principles like Behavioral,
Cross-cultural, Psycho-dynamic, Cognitive, Humanistic etc., that makes preparing
scoring rubric and process difficult.
2. Do not make the questions to be too many. Large number of questions, requiring short
answer is preferred to few long answers. (Large content sampling, blessedness for quality
is reduced easy to elicit desired response). If there are large number of essay questions,
scoring will become tedious and inefficient particularly when students’ size is large.
3. Let the pupils answer all the questions i.e. no optional items. Students should not be
permitted to choose one essay question to answer from two or more optional questions.
Following this guideline will result comparing students on the same measurement scale.
Otherwise, students are assumed as taking different tests for they are working on
different questions.

What is the problem of taking different tests by the students in one classroom? Does
it have any disadvantage?
The use of optional questions should be avoided for the following reasons:
o Students may waste time deciding on an option.

113
o Some questions are likely to be harder which could make the comparative
assessment of students' abilities unfair.
o The use of optional questions makes it difficult to evaluate if all students are
equally knowledgeable about topics covered in the test.
4. Let the questions set cover the objectives and the subject matter specified in the test
blue print. This will enhance its content validity. If the ability to apply principles is being
measured, for example, the questions should be phrased in such a manner that they call
forth that particular behavior.
5. Indicate the mark or point value for each question and if possible, also the time limit
for answering each question. Specifying the relative point value and the approximate
time limit helps students to allocate their time wisely in answering several essay questions
because the directions clarify the relative merit of each essay question. Without such
guidelines students may feel at a loss as to how much time to spend on a question. When
deciding the guidelines for how much time should be spent on a question keep the slower
students and students with certain disabilities in mind. Also make sure that students can
be realistically expected to provide an adequate answer in the given and/or the suggested
time.
6. Restrict the use of essay questions to learning outcomes that cannot be satisfactorily
measured by objective test items. Some types of learning outcomes can be more
efficiently and more reliably assessed with selected-response questions than with essay
questions. In addition, some complex learning outcomes can be more directly assessed
with performance assessment than with essay questions. Therefore, the use of essay
questions should be reserved for learning outcomes that cannot be better assessed by
some other means.
7. State the criteria for grading
Students should know what criteria will be applied to mark their responses. To obtain
adequate reliability, it is necessary to prepare a scoring guide beforehand. As long as the
criteria are the same for the grading of the different essay questions, they don’t have to be
repeated for each essay question but can rather be stated once for all essay questions.

114
 ACTIVITY
Direction: For each pair of essay questions below, determine the better question and
justify why.
1. A. Compare and contrast the U.S. government's foreign policy towards Africa under
William Bill Clinton and George W. Bush administration. Include in your discussion
economic considerations, and political sentiments.
B. How is the U.S. government’s foreign policy towards Africa under William Bill Clinton
and George W. Bush administration?
2. A. What should be done regarding the legality of abortion in our country?
B. Abortion is intended to be made legal in some countries. Are you in favor of, or
opposition to, making it legal in our country? In your answer include the impact of
various social groups, economic and social consequences.

6.2.4 Scoring Essay Items


As has been mentioned, the effectiveness of essay tests as measures of educational
achievement depends primarily on the quality of the grading/scoring process. The
competence of the scorer is crucial to the quality of this process, yet even competent graders
may inadvertently do things that make the results less reliable than they out to be. This
unreliability of scoring could be minimized by implementing the following suggestions:
1. Prepare an outline of the expected answer in advance.
The outline contains the major points to be included, the characteristics of the answer
(e.g., organization) to be evaluated and the amount of credit to be allotted. Preparing a
scoring key provides a common basis for evaluating students’ answer that helps the
scorer to use the key as a yardstick and be consistent when scoring each question.
2. Use the scoring methods which is most appropriate.
There are two common methods of scoring essay questions: The point (analytical) method
and the rating method.
In general, there are two major approaches to scoring essay items: analytic (point scoring
method) and (2) holistic or rating method.
i. Rating/Holistic scoring method: With the rating method, the teacher generally is
more interested in the overall quality of the answer than in specific points. Rating is
done by simply sorting papers in to piles as the answer is read. These piles represent
degrees of quality and determine the credit assigned to each answer. If eight points are
allotted to the question, for example nine piles might be used ranging in value from

115
eight points to zero. This method involves considering the student’s answer as a
whole and judging the total quality of the answer relative to other student responses or
the total quality of the answer based on certain criteria that you develop.
ii. Analytic /point/ scoring method: In point method, each answer is compared to the
ideal answer in the scoring key and a given number of points assigned in terms of the
adequacy of the answer. Before scoring, prepare an ideal answer in which the major
components are defined and assigned point values. Read and compare the student’s
answer with the model answer(s) or at least an outline of major points that should be
included in an answer. If all the necessary elements are present, the student receives the
maximum number of points. Partial credit is given based on the elements included in
the answer. In order to arrive at the overall exam score, the instructor adds the points
earned on the separate questions. It enables the teacher to focus on one characteristic of
a response at a time. The criteria or rubrics may include, content, organization, word
selection, accuracy/reasonableness, completeness, originality and so on. In point scoring
method specific feedback can be given to the student.
Example of a point scoring method is given below:
Criteria and Scores Poor Satisfactory Strong
(0-1) (2-3) (4-5)
1. Clear, interesting, and informative introduction,
summary, and conclusion.
2. Each paragraph has a main idea.
3. Identifies and explains social forces.
4. Explains different perspectives.
5. Authors’ views are clearly recognised.
6. Appropriate information.
7. Effective use of details and examples.
8. Connections with current issues.
9. Satisfies writing requirements.

Generally, restricted essay questions can usually be satisfactorily scored by the point method.
The restricted scope and the limited number of characteristics included in a single response
make it possible to define degrees of quality precisely enough to assign point value. The
extended response question, however, usually requires the rating method.

116
3. Decide on provisions for handling factors that are irrelevant to the learning
outcomes being measured
The content of your answer will be evaluated in terms of the accuracy, completeness, and
relevance of the ideas expressed. The form of your answer will be evaluated in terms of
clarity, organization, correct mechanics (legibility of handwriting, spelling, sentence
construction and punctuation mark usage) should be judged separately from the subject -
matter and only if the prior instructions are given. The scorer should be careful in handling
the interference of these irrelevant factors to the learning outcomes being measured.
4. Evaluate all answers to one question before going on to the next question
One factor that contributes to the unreliable scoring of essay questions is when we move from
one question answer to the next. If there is more than one essay question on the test, grade
each essay separately rather than grading a student's entire test at once. Otherwise, a brilliant
performance on the first question may overshadow weaker answers in other questions (or
vice-versa). Evaluating all answers to one question at a time helps counteract another type of
error that creeps into the scoring of essay questions. Where we evaluate all of the answers
create a general impression of the pupil achievement that colors our judgment concerning the
remaining answers.

One factor that contributes to unreliable scoring of essay questions is a shifting of standards
from one student’s answer to the next. A paper with average answers may appear to be of
much higher quality when it follows a failing paper than when it follows a near perfect one.
One way to minimize this is to score all answers to the first question, reorder the papers, and
score all answers to the second question, and so on until all the questions have been scored. A
more uniform standard can be maintained with this procedure because it is easier to
remember the basis for judging each answer, and answers of various degrees of quality can be
more easily compared. It also helps to counteract another type of error. When we evaluate all
of the answers to a single student, the first few answers may create a general impression of
the student’s achievement that affects our judgment of the remaining answers. Thus, if the
first answers are of high quality, we tend to overrate the following answers; if they are of low
quality, we tend to underrate them. We call this condition a carry over effect —where our
impression of the answer for one item affects the answer for the next item.

117
Carry over effect is the tendency of the scorer to rate the following items based on the
impression he/she formed from the previously rated item. If a student did well on the first
item, the teacher will give high scores for the next items though the answers may be poor,
or vice versa.

When possible, evaluate the answers without looking at the students’ names. The general
impression we form about each student during our teaching is also a source of bias in
evaluating essay questions. This is called halo effect.

Halo effect is a tendency on part of the scorers to allow their general impressions of a
person to influence their evaluation of specific behaviors.

This is a tendency on part of the scorers to allow their general impressions of a person to
influence their evaluation of specific behaviors. If a teacher expects that the student is clever,
is not uncommon for a teacher to give a high score to a poorly written answer by rationalizing
that “the student is really capable, even though he/she didn’t express it clearly”.
5. Evaluate the answers without looking at the pupil’s name.
Remove or cover the names on the papers before beginning scoring. In this way you are more
likely to rate papers on their merits, rather than on your overall impression of the student. The
general impression we form about each pupil during our teaching is our source of bias in
evaluating essay questions. Where possible, the identity of the pupils should be concealed
until all answers are scored.
6. If especially important decisions are to be based on the results, obtain two or more
independent ratings.
Sometimes essay questions are included in tests used to select pupils for awards,
scholarships, special training, and the like. In such cases, two or more competent persons
should score the papers independently and their rating should be compared. After any large
discrepancies have been satisfactorily arbitrated, the independent ratings may be averaged for
more reliable results.

118
SELF TEST CHECKLIST
The following review will help you assess the quality of your essay test items. Read each of
the following questions and put a tick mark in the blanks under “Yes” or “No” Column.

Yes No
Is this the most appropriate type of item to use?  
Are the questions meant to measure higher - order objectives?  
Do the questions measure the intended learning outcomes?  
Are the questions phrased in such a way that students know what is being asked?  
Are students told the bases on which their answers will be evaluated?  
Are generous time limits provided for responding to the questions?  
Is the time limit for each question indicated?  
Is the point value of each question indicted?  

SUMMARY
• Essay test items are suitable for measuring learning outcomes that cannot be measured by
objective tests.
• Essay tests are of two types: restricted response essay tests and extended response essay
tests. In restricted response essay tests, the form and size of response are restricted. In
extended response essay tests, in contrast, students are free both in how to organize their
answers and in determining length of their responses.
• While essay tests are easy to construct and have desirable effect on students’ study habit
they are difficult to score and do not allow for large content sampling. Scoring problems
can be minimized if the guidelines are strictly followed.

SELF TEST EXCERCISES


Direction: Identify the limitations of the following short answer items.
1. ____________________, ______________, ________ and _______ are __________.
2. Italians were defeated in _______________
3. Your textbook says ‘Ethiopia is found in _________part of Africa’
4. ____________was born in Bekoji and became Olympic gold medallist in___________
5. The outcome of building the grand renaissance dam for Ethiopia is ________.

119
Answers for Self-Test Exercises
1. Too much mutilation of blank spaces and thus the question lacks clarity.
2. It is not clearly indicated whether students are asked to indicate the place or year in which
Italians were defeated. Beyond this, it lacks clarity as it hasn’t indicated the country
which Italians fought with.
3. This item encourages students to be overly dependent on text books and ignores reference
books.
4. The item has shown two weaknesses. First blank spaces for answers should have come at
the end of the question than at the beginning. Second, there are multiple correct answers
that hinder scoring efficiency (speed).
5. This is an essay question rather than a short answer one as in many cases when students
explain the geo-political, economic and environmental benefits of the dam, they use
longer sentences than words.

SELF TEST EXCERCISES


Choose an intended learning outcome from a course you are currently teaching and create
an effective essay question to assess students’ achievement of the outcome. Follow each of
the guidelines provided for this exercise. Check off each step on the provided checklist
once you have finished it.

SELF TEST CHECKLIST


Yes No
Clearly define the intended learning outcome to be assessed by the item.  
Avoid using essay questions for objectives that are better assessed with
of objectively-scored items.  
Use several relatively short essay questions rather than one long one.  
The task is appropriately defined and the scope of the task is appropriately limited  
Present a novel situation.  
Consider identifying an audience for the response  
Specify the relative point value and the approximate time limit  
Predict student responses.  
Write a model answer.  
Have colleague critically review the essay question  

120
CHAPTER SEVEN
ASSEMBLING, ADMINISTERING, SCORING AND
ANALYZING CLASSROOM TESTS

INTRODUCTION
Dear learner, until now we have been learning how to write different forms of test: selected
response and constructed response test types.

What should we do after we have prepared our test items?


We need to make them ready for students. This involves many things in it. Reviewing the
test, writing directions for the test items, arranging test items following certain principles are
some of them. After the test items are made ready, they will be administered to testees. Test
administration is governed by certain rules which deal with the physical and psychological
factors. After they are administered and scored test items should be examined in terms of
their statistical qualities, item difficulty index and item discrimination power. This unit has
three sections. The first section deals with assembling test items, the second administering the
test, and the third one analyzing the test items.

Unit Objectives
At the end of this unit, you will be able to:
• state rule of thumb for arranging items of different variety.
• list components of a good classroom test direction.
• state factors to consider in test administration.
• list the advantages of making item analysis.
• determine difficulty level of test items.
• determine discrimination power of test items.
• evaluate the effectiveness of distracters.

7.1 ASSEMBLING TEST ITEMS


Assembling involves recording the test items, reviewing them, arranging the items and the
formats, writing directions, and reproducing the test.

What do we do after we have written the different types of items?

121
7.1.1 Recording Test Items
When writing items, it is recommended that each test item be recorded on a separate page in
our notebook. It is also recommended that in addition to the item the content from which the
item is extracted, and the specific learning outcomes the item is measuring should be
recorded. This enables us to make cross checking with table of specification easily.

7.1.2 Reviewing Test Items


Before tests are made ready for reproduction and administration they should be carefully
reviewed. This is because there are many errors that might be committed when we construct
tests. When we concentrate so closely on some aspect of item construction, we may overlook
(forget) others. This results in a number of unwanted errors that may distort the function of
the item. However, such problems can be detected and minimized by;
i) reviewing the items after they have been set aside for a few days, and
ii) asking a fellow teacher to review and criticize them.
A more careful review of the items can be made by considering them in the light of each of
the following questions.
- Items should measure an important learning outcome included in the table of
specification?
- Items should be free from extraneous clues.
- An item should not be based on the response to another.
- Items should cite authorities for statements that might be considered debatable or
based on opinion.
- Items should be written in a straightforward, simple manner.
- Items should use the simplest method for requiring a correct response.
- Test directions should be clear and complete.
- Items should be grouped by type.
- Items should be grouped according to instructional content.
- Items, within groups, should be arranged in order of increasing difficulty, etc.
7.1.3 Arranging Items in The Test
Once the test items are written, edited, and revised for errors, the next task is to arrange them
in some order. When the final selection of the items has been completed and they are ready to
be assembled into a test, a decision must be made concerning the best item arrangement. The
following suggestions provide guideline for this purpose.

122
- The items should be arranged so that all items of the same type are grouped together.
Each item type requires specific set of directions and a somewhat different mental set on
the part of the examinee. The items should be arranged in the order of increasing
difficulty. Beginning with easy items assist pupils by raising their motivation and
confidence keep up working on the remaining items that follow.
- It may be desirable to group together items which measure the same learning outcomes or
the same subject matter content. The examinee will be able to concentrate on a single
domain at a time rather than having to shift back and forth among areas of content.
Linn and Gronlund (2000) suggest the following arrangement order of items by format;
i. True-False
ii. Matching Exercises
iii. Supply type (Short answer and Completion)
iv. Multiple choice
v. Interpretive exercises
vi. Essay
The above arrangement is based on the assumption that as one goes down difficulty of items
as a group increases. Then it is logical to arrange items in order of increasing difficulty, i.e.,
beginning with the easiest items and proceeding gradually to the most difficult ones.
Beginning with easy items helps to increase motivation and confidence in students to work on
the items that follow. On the other hand, if our test starts with an item that is difficult students
may lose confidence right from the start and may likely be to miss even simple items that
follow.

7.1.4 Preparing Directions for the Test


Directions constitute an inseparable part of a test. Sometimes due to problems with clarity of
directions students get confused with regard to how they are supposed to respond to the
questions. The problem of student confusion can be reduced if certain guidelines are followed
in writing test directions. Test directions generally should include:
i) purpose of the test,
ii) time allowed for completing the test,
iii) directions for responding,
iv) how to record the answers,
v) basis for scoring open ended or extended response tests

123
i. Purpose of the Test
The direction should inform students about the purpose of the test. This includes the course or
the subject the exam is for, whether it is diagnostic, mid semester exam, final examination,
etc. Though this can be done orally when the examination program is set, it is preferable if
the heading of the exam says something about purpose of the test.

For instance, when a final examination is prepared for the course Educational Assessment
and Evaluation, the examination paper will have the following information on the top of the
first page.

Bahir Dar University Purpose of the


test
College of Education and Behavioral Sciences
Department of Psychology
Time allocated
Assessment and Evaluation of Learning (PGDT 423) for the test
How to record Final Examination (50%)
responses
Date: 22-10-2015 E.C.
Time allowed: 1:40 hrs.
Name ______________ ID No. _______ Dept. _______

General Directions: - In this test paper there are four parts: True-false, matching, short answer,
Direction for
responding
and multiple-choice type items. Write your answers on the blank spaces given. For the
selection type items use CAPITAL LETTERS only.

ii. Time Allowed for Completing the Test


It is important to inform students how much time is given to the entire test. In addition, it is
good if students are informed about length of time to be allocated to each section in the test.
This enables students to effectively use their time. It also rescues less able students from
spending unnecessarily long time on questions that appear difficult to them.

In relation to length of time that students should be allowed in a given test, there is no any
limit. It is suggested that time should not constrain students from responding to items. This
does not, however, mean there should not be any limit. The rule of thumb is 1 minute per two
true false items or one multiple choice or one short answer item.

124
iii. Directions for Responding
How students should respond to the questions should be well spelt. For instance, in multiple
choice items students should know whether they have to choose the correct answer or the best
answer. When the testees are young children to help them understand how to respond, it is
important to include correctly marked sample items.

iv. How to Record the Answers


This refers to how students indicate their answers and whether they should provide their
answers on the test booklet itself or on a separate answer sheet. In selection type of test items
like multiple choice students may be instructed to circle, underline, or write the letter of their
choices. But for young children instructing them to underline their answers is preferable.
Number of testees, and length of the test determine where students should preferably record
their answers.

If the number of examinees is small and the test is too short, answers can be recorded on the
test paper itself. On the other hand, with large number of examinees and long test it is
preferable to get students record their answers on a separate answer sheet.

v. Basis for Scoring Open Ended or Extended Response Tests


If there are many essay items it is suggested that we indicated in the direction the weight of
each item. In addition, we have to also indicate the relative weight given for each of the
components we need to appear in the students’ responses to essay test items. For instance, we
need to determine how much is given to factual accuracy, organization, comprehensiveness,
originality etc.
vi. Reproducing the Test
- The test items should be spaced and arranged so that they can be read, answered and
scored with the least amount of difficulty before they are ready for reproduction.
- Most test reproduction in the schools is done on photocopying machines. As you well
know the quality of such copies can vary tremendously. Regardless of how valid and
reliable your test might be poor copies will make it less so.
- Regardless of the reproduction process selected, it is desirable to proofread the entire test
before it is administered. Charts, graphs and other pictorial must be checked especially
carefully to be certain that the reproduction has been accurate and the details are clear.

125
7.2 Administering Tests
The guiding principle of test administration is that there should be a fair chance for all
students to demonstrate their achievement of the learning outcomes being measured. This
means the physical and psychological environment should be suitable to their best efforts and
the control of factors that might interfere with valid measurement.

i. The Physical Environment


The physical environment should be as conducive as possible. Conducive physical
environment includes that the testing room should be quiet, there should be adequate light,
ventilation, adequate workspace, comfortable seats etc. Other necessary conditions are the
questions and questions paper should be friendly with bold characters, neat, decent, clear and
appealing and not such that intimidates testee into mistakes. All relevant materials for
carrying out the demands of the test should be provided in reasonable number, quality and on
time.

ii. The Psychological Conditions


The psychological conditions influence students’ scores more seriously than the physical
ones. The psychological conditions include mental preparedness of testees to take and pass
exams. Any condition that may result in tension should be eliminated. Some sources of
anxiety among students when taking tests include the following:
i) threatening students with tests if they do not behave in a required way.
ii) warning students to do their best because the test is important.
iii) telling students to work fast in order to complete on time.
iv) threatening students on consequences if they fail.
A teacher may induce anxiety or intense fear to the pupils both by word and deed. Therefore,
students should be assured and reassured that the test results are to be used to help them
improve their learning and the time assigned for the test is enough to complete the test.
Other psychological factors to be considered by the teacher are:
▪ Time of testing, if tests are administered just before “The big game” or “the big
holiday”, the results may not be representative.
▪ Individual pupil fatigue, the onset of illness, or worry about a particular problem may
prevent maximum performance.
When teachers observe such problems, they would take some measures like arranging the
time of testing in terms of such factors and permitting the postpone, when appropriate, can

126
enhance the validity of the results. In addition to the above mentioned physical and
psychological factors that interfere with the performance of students there are some practices
we need to avoid during test administration. Things to be avoided are as follows:
i. Don't talk unnecessarily before letting students start working.
First of all since students are likely to think about the exam other instructions unrelated to the
test may be overlooked by examinees. For instance, many students may fail to grasp and
remember when you have arranged for make-up classes if you tell them during test
administration. For some students this may cause frustration.
ii. Keep interruptions to a minimum.
If we have some corrections or related instructions, we have to make them at the beginning of
the test. Otherwise, interrupting the testees now and then by giving corrections or other
instructions is not suggested, because it may disturb them.
iii. Avoid giving hints to students about individual items.
While taking a test, students may ask you about many things related to the test. That may be
about lack of clarity, definition of terms, ambiguity, and unreadability of items. In this case
providing individual clarification is not suggested. But if you believe that the item needs
clarification, you should do it for the entire examinees by calling their attention to it. In
contrast, if you feel that no correction is needed you have to be silent and don't provide
individual clarification. While you are making clarifications on individual basis, you may
unknowingly provide unintended clues.
iv. Discourage cheating.
It is not uncommon to see invigilators do other things when students take exams. For
example, they may score previous tests, work on research proposals or read books. This
undoubtedly gives chance for students to cheat. Cheating may take different forms. Copying
answers from each other, copying answers from scratches of papers, getting others not
belonging to the class take exams for others are some of them. In order to get valid results on
students’ achievement or performance we have to discourage cheating. The best way to avoid
cheating is careful proctoring of examinees. In a condition in which there is a large number of
examinees, it is advisable to have another person assist you. Another and complementary way
is being careful about seating arrangements.
v. Avoid Activities that do not match with test administrations
Supervisors or invigilators frequently may involve in other activities such as, reading books,
reading the exam itself, or other personal activities. These conditions undoubtedly give

127
opportunities for the students to cheat. Under other conditions, however, it might be
necessary to discourage cheating by special seating arrangements and careful supervisor.

7.3 SCORING THE ANSWERS


There are basically three types of scoring. They are hand scoring, machine scoring, and self
scoring. Which scoring method to use depends on availability of scoring equipment, the
speed with which the test results are needed.
A) Hand Scoring - In this type of scoring, the teacher or others can score the test papers of
students. In this case, the teacher prepares the answer keys and provides it to the scorers.
B) Self Scoring - This type of scoring is done by the students themselves. The teacher will
provide the answer keys to the students and the students score the papers. So, in this type
of scoring the students can determine their total score on a test on their own. The
disadvantage is the students may cheat when scoring their own papers.
c) Machine Scoring - As the name itself implies in this type of scoring a scoring machine is
used. This is quite useful with large number of examinees like for example students
taking national examination, EGSCE, all over Ethiopia.

7.4 ITEM ANALYSIS


Dear learner, the purpose of this section is to present you with what do you perform with the
results of the tests. You know that a test is administered to measure students’ achievement.
To make sure whether the test items we used have served as intended or not, we conduct item
analysis. It can be used to identify items that are inefficient in some way. Thus, open the way
to improve or eliminate them, with the result one can develop better overall test.

Dear learner, now think that you have administered and scored the tests. What do you do
with the results of the tests? See the following section.

Item analysis is a statistical procedure which is used to identify good items from poor items
and effective distracters from ineffective distracters. It is the process of examining student’s
responses to each test item (particularly, multiple-choice items) to determine the quality of
test items. It is a useful tool in the progressive improvement of a teacher’s classroom tests. It
enables you to determine items difficulty level and item discrimination index and evaluating
how efficiently distracters played their role. Thus, it enables teachers to select or revise items
for future use.

128
Why classroom teachers conduct item analysis? Now, think about the purpose of item
analysis.

The purpose of item analysis is to select the best items from the poor ones. It is preferable
from educational measurement and evaluation viewpoint to use test items that are judged
good even if they were used before. Thus, here comes the role of item analysis. In item
analysis we will determine which items can be used directly which with revision and which
should be discarded. Apart from the above purpose, item analysis has the following purposes
in classrooms.

a) Item analysis data provide a basis for efficient class discussion of test results
One of the tasks in item analysis is counting the number of times an alternative is chosen by
students as a correct answer. This gives chance for both teachers and students to discuss on
misinformation and misunderstandings. Item analysis also helps teachers to identify technical
defects. They also suggest needed change on scoring keys or scoring rubric in essays for
instance in a case in which high achieving students most frequently chose an alternative you
think is a distracter.

b) Item Analysis data provide a basis for remedial work


Although discussion of test results can provide chance to clarify specific problems, item
analysis suggests general areas of student’s weaknesses that need more attention. If for
example students’ score is less than expected, this may suggest that you need to revisit
critical concepts or topics.
c) Item analysis data provide a basis for the general improvement of classroom
instruction.
Item analysis data provide information that can assist in determining the appropriateness of
learning outcomes and course contents defined for some group of learners. Students’ scores
may lead one to the extent of revising curricula.
d) Item analysis procedures provide a basis for increased skill in test construction
In item analysis, you will identify existence of ambiguity, unintended clues, ineffective
distracters, etc. All this information is useful in revising the items for future use. Mostly
teacher who make item analysis are better than those who do not make in terms of
constructing good test items. This is because the former type of teachers will get the chance
to learn from their own errors.

129
6.4.1 Procedures of Item Analysis for Objective Test Items
Dear student, you may ask how item analysis is done. Item analysis is carried out based on
the following procedures.
1. First, arrange the scored test papers in order from the highest score to the lowest score.
2. Divide the ordered papers in two halves. Put those highest scores in one group and those
with the lowest sores in another. If the number of students (test papers) is less than or
equal to 40 divides in to two equal parts. But if the number of students is more than 40
take the upper 27% the test paper from both the upper group (high achiever) and lower
group (low achiever) and leave the middle. For instance, if the number of students in the
class is 70 and as a teacher you want to conduct item analysis you have to compute 27%
of the upper and lower group that are going to involve in the analysis process.
27
70 x = 18.9
100
So, by using the formula we are going to use 19 test papers from the upper group and 19
test papers from the lower group, together it accounts 38 papers. The remaining 32 test
papers that are found in the middle not needed for analysis.
3. For each item count the number or examinees in the upper group and in the lower group
that choose each response alternative (in the completion, short answer, and true-false
questions count the number of students who answered the question correctly) and record
the counts separately for the upper group and the lower group. Add the counts of the
lower and the upper group for the correct answer, and divide the sum by the total number
of upper and lower group students and multiply the value by 100%. This will provide
index of item difficulty (P). The formula is
Ru + RL
P=  100
T
T= Total number of upper and lower group students
P= Item difficulty Index
RU = number of upper group students who got the item right
RL = number of lower group students who got the item right

The difficulty of a test item is indicated by the percentage of pupils/students who got the
item right. When P levels are less than about 25%, the item is considered relatively
difficult. When P levels are above 70%, the item is considered relatively easy. Generally,

130
test construction experts try to build test items that have most items between P levels of
20% to 80% with an average P level of about 50%.
4. Subtract the counts of the lower group from the counts of the upper group and divide the
result by half of the total number of upper and lower group students. This will provide
index of item discrimination (D).
Ru − RL
D=
1
(T )
2
The interpretation of an item in relation to item discrimination power can be seen as
follows:
- Generally, an item is considered as having average discriminating power of its D index is
closer to 0.5. An item with a maximum positive discriminating power would be one
where all pupils in the upper group got the item right and all the pupils in the lower group
got the item wrong.
Example: Let us assume that a 10-multiple choice items test was administered to 40 students.
The teacher wanted to conduct item analysis. The results of students for the first
item where the correct answer is B are presented below.
Alternatives
Item Number 1 A B* C D Omit
Upper Group (20) 1 19 0 0 0
Lower Group (20) 5 9 0 6 0

For the above item, the difficulty level (P) is given as


20 + 8
P=  100
40
= 70%
and the discrimination power of the item (D) is given as
19 − 9
D=
1
(40)
2
= 0.5
As regards the distracters, alternatives A and D were functioning as intended because they
attracted larger number of students from the lower group than from the upper one. In contrast,
alternative C did not function as intended because it attracted no student. Therefore, these

131
alternative needs improvement for future use or the item should have had three alternatives
only.
5. Evaluating the effectiveness of distracters
Distracters effectiveness is determined by inspection or observation. No need of calculation
of an index. A good or effective distracter is the one that attracts more students from the
lower group than the upper group. Thus, it should discriminate the well to do students than
the poor. The main purpose of distracter is to distract or attract uninformed or unprepared
students from getting the correct answer.
Examination of the following item analysis data will illustrate the ease with which the
effectiveness of distracters can be determined by observation.
Group Alternatives
*A B C D
Upper (10) 5 4 0 1
Lower (10) 3 2 0 5
(*) indicates, A is the correct answer
The following are the comments given:
• The item has positive discrimination index-since 5 in the upper group and 3 in the lower
group got the item right. However, the P value is 40% that is fairly low. This may happen
due to the ineffectiveness of some of the distracters.
• Option “B” is a poor distracter because it attracts more pupils from the upper group than
from the lower group. This may be due to some ambiguity incurred in the statement of the
item.
• Option “C” is completely ineffective as a distracter because it attracted no one, on the
other hand alternative “D” is functioning as intended. It attracted a large proportion of
pupils from the lower group.
• One may improve the low discriminating power of the item by removing any ambiguity in
the statement of the item and revising or replacing alternatives “B” and “C”. Generally,
item analysis data merely indicate poorly functioning items not the cause of the weakness.

6.4.2 Interpreting Item Difficulty and Item Discrimination in Objective Test


Items
Though there is no clear-cut guideline to interpret items based on their level of difficulty and
discrimination there are rule of thumbs. The following guidelines are suggested to determine
the difficulty levels of different formats of test.

132
Item Format Ideal Difficulty Level
Completion and short answer 50
5 response multiple choice 70
4 response multiple choice 74
3 response multiple choice 77
True false 85

As regards item discrimination index, Ebel & Frisbie (1991) suggested the following rule of
thumb.
Index of D Interpretation
0.40 and up Very good item
0.30 to 0.39 Reasonably good but possibly subject to improvement
0.20 to 0.29 Marginal that needs improvement
Below 0.20 Poor items

6.4.3 Cautions in Interpreting Item Analysis Data


When interpreting item analysis data, we have to be cautious. The following are major
cautions.
1. Item analysis data are not analogous to item validity
Item analysis data simply tell you how difficult and discriminating the items are. Otherwise,
they do not tell you anything about to what extent the item analyzed corresponds to the
learning outcome of interest. An item that is not a valid measure of a learning outcome may
have an acceptable level of difficulty and discrimination power.
2. The discrimination index is not always a measure of item quality
Though item discrimination index tells about the quality of an item, one should not
automatically decide that an item should be discarded for it has low discrimination power.
Sometimes there might be items with low discrimination power. This may happen when
items are so easy that many students from both the upper and from the lower group students
answer them correctly. Such items may be purposely retained to appear in the front places of
our tests for a reason we saw in assembling tests. So as long as they discriminate positively,
we can retain some items. What is not needed at all is an item that discriminates negatively.
3. Item analysis data are tentative
The statistics you get from your analysis are not fixed. They can be influenced by the nature
of group tested, the number of students tested, instructional procedures employed by the

133
teacher and chance errors. This suggests if you feel you can construct a better item, it will be
preferable for you to construct a new one than using an item that was used before.
4. Avoid selecting test items purely on the basis of their statistical properties
Items should not be selected primarily on the basis of their difficulty level and discrimination
index. Primary importance should be given to the extent to which the item measures the
objective and content of interest.

UNIT SUMMARY
• After they have been prepared, classroom tests items should be made ready to be taken by
students. This includes reviewing of the test items in light of the suggestions to be
considered in constructing the respective types of test items.
• The reviewed test items should be arranged in sections to allow for having the same
direction for a set of items and scoring easier. In addition, items should be arranged from
simple to difficult to reduce fear of failing on the part of examinees.
• Test administration requires that the teacher makes physical environment as well as the
psychological set up of the learner conducive for maximum performance on the test. The
physical environment includes silence, ventilation, illumination, and working space,
whereas the psychological factor mainly refers to whether the test taker is mentally relaxed
to take the test.
• Interruptions, talking unnecessarily before exam, cheating and individual clarification are
practices that need to be avoided in test administration.
• Item analysis is used to determine whether items can be used for future.
• Item analysis data serve the purposes of efficient class discussion of test results, providing
a basis for remedial work, and providing a basis for the general improvement of classroom
instruction. In addition, item analysis procedures provide a basis for increased skill in test
instruction.

134
SELF TEST CHECKLIST
It is time to check your understanding of assembling, administering, scoring, and analyzing
classroom tests. Read each of the following questions and answer them by checking in one of
the boxes under alternatives 'Yes' or 'No'.
Yes No
Can you state rule of thumb for arranging items of different variety?
Can you list components of a good classroom test direction?
Can you state factors to consider in test administration?
Can you list the advantages of making item analysis?
Can you determine difficulty level of test items?
Can you determine discrimination power of test items?
Can you evaluate the effectiveness of distracters?
Is there any box that you marked under 'No'? If there is any, please go back to your text and
read about it before you go to the following unit.

SELF TEST EXCERCISES


Following are an item analysis data. The data indicate the number of students choosing each
alternative. The correct answer is marked by (*). Based on the data, determine the difficulty
level (P) and discrimination index (D), and interpret the results. Evaluate the effectiveness of
the distracters.
Item Group Alternatives
Number A B C D Omit

1. Upper 27% 8* 5 7 7 0
Lower 27% 2* 9 8 8 0
2 Upper 27% 13 2 10* 2 0
Lower 27% 5 1 9* 12 0
3 Upper 27% 2 20* 3 2 0
Lower 27% 11 2* 8 6 0
4 Upper 27% 0 4 3 20* 0
Lower 27% 0 7 12 8* 0

135
UNIT EIGHT
DESIRABLE QUALITIES OF GOOD TESTS
INTRODUCTION
Dear learner, in this unit, you will be introduced with the most important characteristics of or
qualities required by tests and other measuring instruments. The quality of the tests used in
measuring students’ behaviors and academic performance is very important for the decisions
teachers and other educators made about instruction, student progress and other important
educational processes. In this unit, we present several procedures used by teachers and others
concerned to ensure that the decisions they make, based on the data they gather from their
students using tests and other instruments are valid and reliable.

Unit Objectives
By the completion of this unit, you will be able to:
• State the meaning of validity and reliability.
• Describe the different types of validity.
• Describe ways to estimate test reliability.
• Justify the need of reliability and validity in test construction.
• Explain the factors that influence reliability of test scores.
• Explain the factors that determine validity of test scores.

A teacher administered the same test to his students two times. The students’ scores
were somehow similar in the two testing conditions. That is, students who got high in
the first testing also scored high in the second, and those who scored low in the first
testing also scored low in the second test. This shows that there is a consistency of
scores. What we call this consistency of the scores? The teacher wanted to measure
students’ ability of reading using a test, but the results indicated students’ spelling
ability. What is the problem of the test?

8.1 Validity
Validity is the most important idea to consider when preparing or selecting a test or other
measuring instrument for use. More than anything else, teachers want the information they
obtain through the use of tests to serve their purposes. The drawing of correct conclusions or
making decisions based on the data obtained from an assessment is what validity is all about.
The validity of a test is how well it fulfills the function for which it is being used. Regardless

136
of the other advantages of a test, if the test lacks validity, the information it provides is
useless. The validity of a test can be viewed as the "correctness" of the decisions or
conclusions made from performance of students gathered through the tests.

Validity has been defined as referring to the appropriateness, meaningfulness, and usefulness
of the decisions teachers make base on the information they collect from their students’ using
tests and other instruments. The term validity refers to the accuracy of test results of students;
that is, it addresses the question of how confident can we be that a test actually indicates a
person's true score on a trait.

8.1.1 Types of Validity


The extent to which a test provides correct information about students’ performance is the
extent to which it provides useful information for making decisions and, hence, has validity
for this purpose. Validity is divided into three categories or types.

1. Content Validity
The relevant type of validity in the measurement of a behavior is content validity. In
assessing the content validity of a test, the teacher asks, "To what extent does the test require
demonstration by the students who have taken the test that constitute all aspects of the
knowledge or other behavior being measured?" This type of validity refers to the adequacy
of the assessment. According to Whitely (1996), adequate assessment has two components:
1. relevance of the content of the test to the objectives or behavior being measured, and
2. representativeness of the content in the test.
A relevant test assesses only those objectives that are stated in the instruction. For a test to
have high content validity, it should be a representative sample of both the objectives and
contents being measured. The items should deal with parts of the course covered during
instruction. In other words, there should be sufficient number of items from various chapters
or parts of the course treated in the classroom. There should be a match between the items of
the test and the course contents and the objectives of instruction.

For instance, relevant test items ask only about the material the test is supposed to cover, and
not about other materials. Representative test items sample all the topics that are supposed to
be covered by the test. For example, if a test covers three chapters of a textbook, each of
which received equal emphasis in the course, the principle of representativeness requires that

137
the test contains questions on each chapter, not just one or two, and that each chapter has
equal weight in determining the overall test score.

Dear learner, how does one determine the content validity of a measure?
Content validation is primarily a process of logical analysis. By means of a careful and
critical examination of the items of a measure in relation to the behavior or the purpose of the
tests, one must make the following professional judgments:
1. Does the test content parallel the instructional objectives to be assessed?
2. Do the items of the measure cover different aspects of the course and objectives?
It is crucial that each of the points in question items looked at and be studied carefully before
the test is in use.

If you were asked to carry out a content validity study on some of the national
examinations, what steps would you follow?

2. Criterion - Related Validity


Criterion-related validity indicates whether the scores on a test predict scores on a well-
specified, predetermined criterion. Consider the question, “Do students performances on
ESLCE predict their first year first semester college performance?” Here there are two things
or variables. They are students’ ESLCE performance and college performance. So, our
purpose is to know if students who are successful in ESLCE will be also successful in
college. In this case, we say that ESLCE is a predictor and college performance is a
criterion. The Ministry of Education and higher learning institutions in Ethiopia (of course
elsewhere in most countries in the world) use this type of validity in the selection and
admission of secondary school completes into colleges and universities. Those students who
scored high in 12th grade national examinations are selected and admitted to higher learning
institutions. The idea is those who score high in the national examination will be successful in
college education. This type of decision making is criterion related validity. In this type of
validity, the specification of the criterion used becomes crucial, and this specification is
usually based on professional judgment. The value of criterion-related validity, then, depends
on the appropriateness and quality of the criterion.

There are two types criterion-related validity. They are concurrent validity and predictive
validity. Concurrent validity uses correlation coefficients to describe the degree of
relationship between the scores of two tests of students given at about the same time. A high

138
relationship suggests that the tests are assessing something similar. This type of evidence is
used in the development of new standardized tests or other measuring instruments that
measure – in a different, perhaps more efficient, way – the same thing as an old instrument. A
new test of intelligence, for example, may be correlated with such established instruments as
the Wechsler intelligence test. Wechsler intelligence tests are old tests. If a new intelligence
test has been developed by a test expert, the test expert may administer both the new and
Wechsler intelligence tests to the same group of students on the same day. Then the expert
will determine the correlation between the test results of the students on both tests. The
correlation between the two test scores should be high because the two tests are measuring
the same behavior, that is, intelligence. The purpose of doing this is usually to substitute one
test by another. In this case, the criterion selected is Wechsler intelligence test.

In predictive validity, the data on the criterion variable are collected sometime after the data
on the predictor variable are collected. Take the case of ESLCE and college performance.
Students take the national examination in March. Their college performances are collected
almost one year later in February. In both concurrent and predictive procedures, a test is
related to another criterion measure.

3. Construct Validity

Dear learner, have you ever seen intelligence in your necked eyes? What is it look like?
We have never seen it. However, we talk about intelligence every time. We say that a
student is intelligent or not intelligent. What is our ground to say that?
The term construct refers to a psychological construct, a theoretical conceptualization about
an aspect of human behavior that cannot be measured or observed directly (Ebel and Frisbie,
1991). Construct validity is an interpretation or meaning that is given to a set of scores from
tests that assess a behavior or construct that cannot be measured directly, such as measuring
an unobservable trait like intelligence, creativity, or anxiety. For example, we use tests to
measure intelligence. Intelligence is a variable that we cannot directly observe it. We infer it
from the students’ test scores. Students who scored high on the test are said to be intelligent.
Of all of the types of validity, construct validity is the most difficult to obtain.

Construct validity involves the use of both content related evidence and criterion related
evidence. That is, the test developed should properly represent items from domains that are

139
supposed to be indicators of the construct. In addition, there should be evidence that scores
from the test can predict the construct.

4. Face Validity
Strictly speaking, face validity is not a major type of validity. It refers to the degree the
content of the test look valid or the extent to which a test appears to measure what it is
intended to measure, to those, for example teachers who prepared the test items, who
administer and/or value the test (Worthen et al., 1999). Face validity may not be as important
as content validity, criterion related validity or construct related validity from measurement
perspective. However, if a measurement tool does not look valid, people’s (examinees for
example) trust on the test will be low.

Generally, all validity types we have seen above are not equally relevant for different types of
tests with different purposes. For classroom tests, because objectives and contents can be
clearly spelled out content validity sounds more. In cases in which you use tests for
prediction purposes, criterion related validity becomes essential. Validity is clearly the single
most important aspect of tests or other measuring instruments and the scores that result from
the tests.

8.1.2 Factors Affecting Validity


A number of factors influence validity of a test. The following are some of them.
1. Unclear Direction: If directions do not clearly indicate to the test items, the validity of
the test is reduced.
2. Reading Vocabulary: If the reading vocabulary is poor, the students fail to reply to the
test item, even if they know the answers. It rather becomes a reading comprehension test
for them and the validity decreases.
3. Difficult Sentence Construction: If a sentence is so constructed as to be difficult to
understand, students would be confused which will affect the validity of the test.
4. Poorly Constructed Test Items: Reduces the validity of a test.
5. Use of inappropriate items: With the help of objective type items, the pupil’s power of
organizing matter cannot be judged. By written test items the pupil’s correct
pronunciations cannot be judged. The use of such inappropriate items lowers validity.
6. Medium of Expression: English, as the medium of expression and response for non-
English medium students create many serious problems such tests instead of measuring

140
learning outcomes in a subject measure primarily the knowledge of English which
ultimately affect the validity.
7. Difficulty level of items: items that are too easy or too difficult will not provide reliable
discrimination among pupils and therefore lower validity.
8. Inadequate time limits: If the time is too short then the test will be a speed test rather
than the test to measure the intended domain. Hence, it affects test validity.
9. Test too short: a test is only a sample of the many questions that might be asked. Thus, a
test which is too short fails to provide a representative sample of performance.
10. Influence of Extraneous factors: Extraneous factors like the style of expression,
legibility, spelling, handwriting, length of the answer, method of organizing the matter
and halo effect influences the validity of a test. E.g., If a student scores less mark due to
lack of neatness while answering a test item in algebraic equations, it does not tell truth
therefore, it is not valid.
 Activity
1. Name, define and explain the two aspects of content-related validity.
2. Name, define and explain the two aspects of criterion-related validity.
3. Indicate each of the following whether it is a content validity, predictive validity,
concurrent validity, or construct validity.
a. The purpose of this test is to investigate whether first year GPA correlates
significantly with students’ graduation from Bahir Dar University.
b. The teacher is interested in examining how a newly developed test scores of students
correlate with the existing measures of reading ability of students.
c. The teacher gives much emphasis to represent the contents and objectives of the
course taught in the classroom in the test he is preparing.

8.2. Reliability
Dear learner, if you measure the length of a table using the same measuring instrument ten
times in a day, can you get exactly the same results? Most experiences indicated that it is very
difficult to exactly get the same results. There is some deviation in the results obtained from
one measurement to the next. This difference may be due to way the person measures the
table, or other factors. In classrooms, it may happen that students in your classes may give
different answers to the same questions administered at different times. This can be due to a
number of factors like students’ boredom, poor instruction, guessing effect, etc. This

141
variation makes it difficult to make appropriate judgment about your students’ learning. The
problems represented here are related to reliability.

Reliability can be defined as a measure of how consistent our measurements are. Scores that
are highly reliable are accurate and can be reproduced.

Why reliability is important? Can you give some reasons why it is important? If it is
difficult for you to give an answer to this question, please discuss it with your study group
members.

Reliability refers to the consistency of test scores.

Consistent measurement is a necessary condition for high quality educational and


psychological testing. If the scores from a test are very inconsistent, exhibiting large
fluctuations from one sample of performance to the next, then the teacher has no valid basis
for using these scores to decide a student’s general status on his/her educational process. A
necessary condition for a test score to be valid is that it has an acceptable degree of reliability.
Nevertheless, as the degree of reliability of test scores diminishes (deceases), so does their
degree of validity.

Reliability affects validity and quality of decision.


Note that
although reliability is a necessary condition for valid test score, it is not a sufficient condition.
In other words, reliable test is not necessarily a valid test. Other factors should be considered
for validity in addition to reliability.

8.2.1. Theoretical Representation of the Concept of Reliability


After the test papers are corrected, students receive marks. The marks represent the number
of point students get the items correct. We call these marks obtained scores.

Do you think that all observed scores you received from any test is purely an outcome of
your knowledge? Basically, they are not. Why is it so?

Because you get some items correct by guessing, you get other items correct because you
know the answers; sometimes you miss the correct answer due to a problem in the wording of

142
the items. Thus, we can say that obtained scores can be divided in to two parts: a true score
and an error score.

The true score is that part of the obtained score the students get the item correct by his/her
knowledge. The error score is a part of score that the student gets the item correct by chance
(by guessing), or missed the item due to some factors. In other words, the error score is the
amount of score that is added to or reduced from the obtained score because of factors that
intervene in the testing situation.
The following equation shows the relationship among observed test score, true score and
error score.
Observed Score (X) = True Score (T) + Error Score (E)
Reliability ( rxx ) of test scores can be expressed as the ratio of the variance of true scores to
the variance of observed scores.
 T2
rxx = 2
X
The above equation indicates that reliability coefficient is an index of how much of the
variability in observed scores is the result of variability in true scores. If rxx approaches 1.00,
it is possible to say that the observed scores are accurate indicators of true scores. In contrast,
if it approaches 0.00, it will be possible to say the observed scores are the results are
outcomes of errors scores.

Observed scores are the marks a student receives by answering the items of a test
correctly.

Here, the problem is that it is not possible to determine the value of a student’s true score and
error score. Note that neither true scores nor error scores can be “seen.” For example, what is
“true spelling ability?” It is difficult to determine the true score.

True score is the average (mean) of the observed scores the person would obtain over
repeated testing.

This is a theoretical explanation of the nature or reliability. This is possible when we


repeatedly measure the same group students’ knowledge using the same test at different
times. However, we cannot repeatedly measure the same students at different times using the

143
same tests. Because the students will not be willing, even when they are willing, their
answers for the same question will not be the same. They may change their answers from one
testing situation to the other. Therefore, we cannot determine reliability using this formula.
Instead, we use different estimates of reliability.

Think for a minute about tests you have taken. Were the scores you received accurate,
or was there some degree of "error" in the results? Were some results more accurate than
others? In measuring human traits, whether achievement, attitude, personality, physical skill,
or some other trait, you will almost never obtain a result that does not have some degree of
error.

Error scores are the difference between obtained scores and true scores.

Many factors contribute to the less than perfect nature of our measures. There may be
ambiguous questions, the lighting may be poor, some students may be sick, guessing on an
achievement test may be lucky or unlucky, observers may get tired, and so on. What this
means is that even if a behavior remained the same when two tests were given a week apart,
the scores would not be exactly the same because of unavoidable errors.

8.2.2. Methods of Estimating Reliability


There are several methods of estimating reliability of a measuring instrument or a test. The
common ones are stability, equivalence, stability and equivalence, internal consistency, and
ratter agreement.

1. Stability (Test-retest)
A coefficient of stability is obtained by correlating scores from the same test of a group of
individuals on two different occasions. If the scores of the individuals are consistent (that is,
if those scoring high the first time also score high the second time, and so on) then the
correlation coefficient, and the reliability, are high. This test-retest procedure assumes that
the characteristic measured remains constant. Unstable traits, such as mood, should not be
expected to produce high stability coefficients. Furthermore, stability usually means that
there is a long enough time between measures (often several months) so that the consistency
in scores is not influenced by a memory or practice effect. In general, as the time gap
between measures increases, the correlation between the scores becomes lower. In addition,

144
chance and maturation (an increase in the age of the student) are some factors that affect the
reliability of the test.

2. Equivalence (Parallel forms)


In contrast to the test-retest procedure, the equivalent forms estimate of reliability is obtained
by giving two forms (with equal content, means, and variances) of a test to the same group on
the same day and correlating these results. With this procedure we are determining how
confidently we can generalize a person’s score to what he would receive if he took a test
composed of similar but different questions. When two equivalent or parallel forms of the
same instrument are administered to the same group of students at about the same time, and
the scores are related, the reliability that results is a coefficient of equivalence. Even though
each form is made up of different items, the scores received by an individual would be about
the same on both forms. Equivalence is one type of reliability that can be established when
the teacher has a relatively large number of items from which to construct equivalent forms.
Alternative forms of a test are needed in order to test initially absent subjects who may learn
about specific items from the first form or when an instructor has two or more sections of the
same class meeting at different times.

3. Equivalence and Stability


When a teacher needs to give a pretest and posttest to assess a change in behavior, a
reliability coefficient of equivalence and stability should be established. In this procedure,
reliability data are obtained by administering to the same group of individuals one form of a
test at one time and a second form at a later date. If a test has this type of reliability, the
teacher can be confident that a change of scores across time reflects an actual difference in
the behavior being measured. This is the most stringent type of validity and it is especially
useful for studies involving improvement.

To minimize the effects of memory, chance and maturation factors, reliability coefficients
estimated from parallel-forms of a test are preferred to test-retest reliability coefficients. It is
assumed that examinees' true scores are the same on the parallel forms. Determining the
parallel-form reliability coefficient of a test requires that two forms of the test be
administered to the same group of examinees. For many purposes, this is impractical. It not
only requires the development of two forms of the test, it requires two administrations of the
test (like the test-retest method). Sampling items that are equivalent is very difficult. Lack of

145
cooperation, motivation, fatigue, and boredom by examinees on a second administration of a
test present additional weaknesses in using these methods to estimate reliability.

Fortunately, there are methods of estimating a test's reliability that require only one
administration; these methods are commonly employed in practice (and are the only
practicable method for teacher-made tests). These methods of reliability estimates are called
internal consistency reliability.

4. Internal Consistency
Internal consistency is another type of estimating reliability. It is the most common type of
reliability since it can be estimated from giving one form of a test once. There are three
common types of internal consistency: Split-half method, Kuder-Richardson, and the
Cronbach Alpha methods.

a) The Split-Half Method


To avoid the practical difficulties associated with two test administrations, a power
(unspeeded) test can be split into two parts. In split-half reliability, the items of a test that
have been administered to a group are divided into two comparable halves, and a correlation
coefficient is calculated between the halves. If each student has about the same scores on
each half, then the correlation is high and the test has high reliability. Each test half should be
of similar difficulty. This method provides a lower reliability than other methods, since the
total number in the correlation equation contains only half the items (and we know that other
things being equal, longer tests are more reliable than short tests). This technique should not
be used with speeded tests. This is because all students do not answer all items, a factor that
tends to increase the correlations between the items.

In splitting the test into two halves, one might put, for example, the odd-numbered items (1,
3, 5, …) into half A and the even-numbered items (2, 4, 6, ...) into half B. From a single test
administration, a score for each half can be obtained for each examinee, and the half-scores
can be correlated to obtain a coefficient that can be labeled roe (correlation of odd and even
item scores), that is, the "parallel-form reliability" of a half-length test. Fortunately, one can
estimate the reliability (rxx) of the full test from roe via the Spearman-Brown formula as
follows:
2roe
rxx =
(1 + roe )

146
Where;
roe = the Pearson correlation between the half-test scores on the odd and the even items
The Spearman-Brown split half method assumes the two halves have equal standard
deviations.

b) Kuder - Richardson (K-R) Method


Many years ago, Kuder and Richardson (1937) devised a procedure for estimating the
reliability of a test without splitting it into halves. They have developed a number of formulas
in order to correlate all items on a single test with each other when each item is scored right
or wrong, correct or incorrect, yes or no, and so on. K-R reliability is thus determined from a
single administration of a test for the same group of students, but without having to split the
test into equivalent halves. This procedure assumes that all items in the test are equivalent to
each other, and it is appropriate when the purpose of the test is to measure a single behavior,
for example, reading ability of students. If the test is used to measure language ability and
mathematical ability, we cannot determine the reliability of all items at once. We should
determine the reliability of the items for each behavior separately. If a test has items of
varying difficulty or if it measures more than one behavior, the K-R estimates would usually
be lower than the split-half reliabilities.

c) Cronbach Alpha Method


The Cronbach Alpha, or sometimes called Alpha Coefficient, developed by Cronbach (1957),
also assumes that all items have similar difficulty levels. It is a much more general form of
internal consistency than the K-R, and it is used for items that are not scored right or wrong,
yes or no, true or false. The Cronbach Alpha is generally the most appropriate type of
reliability for tests or questionnaires in which there is a range of possible answers for each
item.

d) Raters Agreement
The fifth type of reliability is expressed as a coefficient of agreement. This is established by
determining the extent to which two or more persons agree about what they have seen, heard,
scored, or rated. For example, when two or more teachers score the answers of students for
essay items, will they give the same or similar scores for the students, i.e., do they agree on
what they score? If they do, then there is some consistency in measurement.

147
Dear student, think of a situation where you and your friends were assigned to be judges
of beauty contest. There were five girls for the contest. You are required to give rank to
the girls according to their beauty. When do we say that you have agreed with each
other about the ranks you assigned to the girls? How can we determine whether your
agreement is high or not?

This type of reliability is commonly used when two or more teachers score essay items of the
same groups of students and when performance-based assessments are used. It will be
reported as inter-rater reliability or scorer agreement and will be expressed either as a
correlation coefficient or as percentage of agreement. However, this type of analysis does not
indicate anything about consistency of performance or behavior at different times. To
compute the magnitude of agreement, we can use Cronbach alpha.

Assume you wanted to promote a student from grade 9 to 10. However, you felt that the
test results of the student do not reflect the student’s academic competence because the
student might have copied from other students during testing. You decided to test the
student in a very strict condition. You found that the student’s scores were very poor.
What decisions are you going to make? What is the major problem of the student’s
scores under these two conditions?

8.2.2. FACTORS INFLUENCING RELIABILITY


A number of factors influence reliability of test scores. The following is about them.
a. Test related factors
These factors include test length, difficulty of test items and score variability.
1. Test length. When we have a large number of items, the reliability of the items would
increase.
2. Difficulty of test items. Score variability depends on difficulty level of items. If items
are too difficult, only few students will answer them. As a result, scores will be all low.
On the other hand, if items are too easy, many students will answer them. As a result,
scores will be mostly high. In both instances, scores do not vary very much. This
contributes to low reliability index. In contrast, in moderately difficult items students
scores are highly likely to vary which will result in high reliability index. According to
Ebel & Frisbie (1991) items having 40% to 80% difficulty level contribute much to

148
reliability. On the other hand, items that have more than 90 percent or fewer than 30
percent of the examinees answer correctly cannot possibly contribute to reliability.
3. Score Variability. As scores vary, reliability tends to be higher. Compared to true-false
items, multiple-choice items yield higher reliability indices. This is so because in true-
false items students have a 50% probability of getting a correct answer by chance thereby
contributing to low score variability among students. In multiple choice items with four
options, on the other hand, the probability of getting an item right by chance is 25%
which results in better score variability among students.
b. Examinee Related Factors
These factors include the nature of the group tested, student testwiseness, and student
motivation
1. Nature of the group tested. Assume that all students in your class are brilliant. When
you administer a test, they score very high and the difference between the highest and the
lowest score is very narrow. If all the students are weak, their scores would be low.
Accordingly, the range of the scores between the maximum and minimum would be low.
But if there are high achieving, average, and low achieving students in the same
classroom, their scores vary from high to low. If students considerably vary in their
achievement levels, their scores are likely to vary. In contrast in a class of similar
achievement levels there will be low score variability. Thus, reliability is higher in
heterogeneous students than in homogenous students.
2. Student test wiseness. Some students are wise when they are taking tests. Though they
do not know the idea, they are very clever at guessing the correct answer for the item
using some clues or something that leads them to the correct answer. Students therefore
vary in their skill of taking tests. This skill creates a difference in the students’ scores.
When students vary considerably in the level of testwiseness, error scores will be higher
which in turn results in lower reliability of scores. If students are more or less the same in
their level of testwiseness there will not be much random error, which leads to better
reliability of scores.
3. Student Motivation. Who performs well: A motivated student or an unmotivated
student? Does motivation have any effect on academic performance of students?
Obviously, academic motivation has a strong effect on students’ performance. Students
who are motivated academically tend to achieve higher than those who are not motivated.
When students are unmotivated, their performances may not be the reflections of their

149
actual performances. In a classroom where students vary in terms of motivation there will
be variability in scores leading to lowered reliability.
c. Administration Related Factors
Factors that are related to test administration also influence dependability of scores. These
factors include time limits and cheating opportunities
1. Time Limits. Whether a test is speeded or power matters when it comes to the reliability
of the test. When internal consistency reliability indices are determined from a single
administration speeded tests, they will vary high. This is because in speeded tests
students' scores are likely to be the number of items attempted. Thus, it is suggested that
when reliability is to be determined to speeded tests, two test administration be used and
correlation calculated.
2. Cheating. Any form of cheating reduces score reliability. Cheating includes copying
answers from others, using cheat sheets, passing answers to next exam halls, getting a test
prior to its administration, etc. All these instances give unfair advantages to some students
resulting in a raise in observed scores of students. However, if students are retested under
controlled condition, those who had chances to cheat in the first test will definitely score
lower. Therefore, this makes the first scores unreliable.

SELF TEST CHECKLIST


Now it is the time to check your understanding of the points you studied in the unit. Read the
following items and answer them by checking in one of the boxes under alternatives “Yes” or
“No”.
Yes No
Can you define the meaning of validity?  
Can you describe the different types of validity?  
Can you determine validity of test scores?  
Can you list factors that influence validity of test scores?  
Can you explain the importance of test validity?  
Can you define the meaning of reliability?  
Can you describe the different types of reliability?  
Can you determine reliability of test scores?  
Can you list factors that influence reliability of test scores?  
Can you explain the importance of test reliability?  

150
Is there any box that you marked “No” under it? If your answer is “yes” go back to your text
and read about it.

UNIT SUMMARY
Dear student, in this Unit the principal ideas, conclusions, and implications about reliability
and validity are presented.
• The concepts of reliability and validity are particularly important for tests that purport
(are supposed) to measure psychological constructs such as intelligence, learning ability,
or creativity.
• Validity can be defined as the degree to which a test measure what it is planned to
measure.
• Validity has three categories: Content validity, criterion related validity and construct
validity.
• Content validity is related to how adequately the content of the test samples (represents)
the contents and objectives about which inferences are to be made.
• Criterion-related validity refers to the extent to which scores on a test are related to some
give criterion measures or predict the success of students in the future.
• Psychologists and teachers develop tests to measure traits and abilities such as
intelligence, anxiety, creativity, and social adjustment. These traits are called constructs.
Construct validity refers to the extent to which the tests developed to measure or
determine traits or constructs could measure the “amount” of the traits or constructs a
person has.
• Reliability is the extent to which the test gives consistent or dependable results at
different measurement occasion.
• Reliability is a necessary but not a sufficient condition for reliability.
• There are different estimates of reliability. Reliability and validity are affected by a
number of factors.

SELF TEST EXCERCISES


Identify the questions as of validity or reliability.
1. Does the assessment adequately evaluate academic performance relevant to the desired
outcome?
2. Does this assessment tool enable students with different learning styles or abilities to
show you what they have learned and what they can do?

151
3. Can multiple people use the scoring mechanism and come up with the same general
score?
4. Is the grading scheme consistent; would a student receive the same grade for the same
work on multiple evaluations?
5. Does the content examined by the assessment align with the content from the course?
6. Does this assessment method adequately address the knowledge, skills, abilities,
behavior, and values associated with the intended outcome?

152
UNIT NINE
ASSESSING PERFORMANCES AND PRODUCTS

INTRODUCTION
Dear learner, so far we have been discussing different types of written (or paper-and-pencil)
tests, their strengths and weaknesses, the procedures for developing items, scoring, and item
analysis. In educational process, however, there are complex instructional goals that cannot
be directly measured by paper – and – pencil tests. Many skills students learn in schools
involve some type of performance, such as delivering an oral report, driving a car, operating
audiovisual equipment, and performing athletic skills such as playing football, running,
throwing discuss, gymnastic, etc. We use authentic assessments for that purpose. For
instance, can we measure the students’ ability to drive a car using objective or essay type
items? Practically, not! Therefore, it is imperative to use another form of assessment that
focuses on measuring student performance and products.

Unit Objectives
At the end of this unit, you will be able to;
• Distinguish between traditional and authentic assessment tools.
• Construct authentic assessment tools.
• Know the functions and value of performance assessment.
• Recognize the difference between procedures and products.
• Discuss the major concerns in evaluating products and processes.
• Know how to evaluate procedures and products.
• Know the steps in developing a performance test.

Dear learner, can paper and pencil tests measure all types of learning outcomes? What
kinds of learning outcomes are difficult to measure with such tests? What is performance
assessment?

9.1 Performance Assessment Defined


In a performance assessment, examinees demonstrate their knowledge and skills by engaging
in a process or constructing a product. More broadly, a performance assessment is a system
composed of (1) a purpose for the assessment, (2) tasks (or prompts) that elicit the
performance, (3) a response demand that focuses the examinee’s performance, and (4)
systematic methods for rating performances. Scholars have also defined Performance
assessment as a 'systematic' attempt to observe and rate a person's ability to use skills and
knowledge to achieve specified goals described in terms of four major attributes: assessment
context, stimulus conditions or the test exercise, response mode, and scoring procedures.

A performance assessment requires you to perform a task rather than select an answer from a
ready-made list. In this method of assessment, you are actively involved in demonstrating
what you have learned. Performance assessments may be more valid indicators of your
knowledge and abilities than written tests.

Performance tests are techniques that try to establish what a person can do (the examinee
makes some type of motor or manual response, e.g., adjusting a microscope or playing a
musical instrument) as distinct from what he knows (e.g., Where is the African Union
headquartered?).

Why are some advantages in student assessment paying attention to performance


assessment, which includes direct observations of behavior, logs and journals, videotapes of
student performance, and the like?

Up to this point, we discussed the measurement of behavioral changes in the cognitive


domain with conventional teacher-made pencil-and-paper achievement tests. There are,
however, many times when we want to evaluate not only what a person knows but what a
person can or will do. We are interested in performance because, so often, what a person
knows is not a good predictor of what a person can or will do. And measuring what a person
can or will do requires both an instrument (performance test) and a procedure.

Thus, there is an ever-increasing attention to alternative modes of assessment to supplement


conventional objective measures. Such an alternative assessment is performance assessment.
Performance assessments require students to demonstrate mastery of a skill or procedure.
Traditionally, vocational educators have relied on performance-based assessment strategies to
judge students’ mastery of job-specific skills. The skills that must be demonstrated in
performance tasks can vary considerably. Some tasks may demand that a student demonstrate
his or her ability in a straightforward way, much as was practiced in class. One health trainee
assessment involves changing hospital bed-sheets while the bed was occupied. Other tasks
may present situations demanding that a student determine how to apply his or her learning in
an unfamiliar context (e.g., figuring out what is causing an engine to run roughly).

154
In general, performance tasks are hands-on activities that require students to demonstrate
their ability to perform certain actions. This category of assessment covers an extremely wide
range of behaviours, including designing products or experiments, gathering information,
tabulating and analyzing data, interpreting results, and preparing reports or presentations. In
the vocational context, performance tasks might include diagnosing a patient’s condition
based on a case study, planning and preparing a nutritionally balanced meal for vegetarian or
identifying computer problems in an office and fixing them. Analogously in teaching, PGDT
students might be expected to show whether they can use appropriate methods depending on
the nature of the content, class size etc., intervene appropriately when misbehavior occurs,
and ask students valid questions during their practicum periods. Performance tasks and
assessment are particularly attractive to vocational educators because they can be used to
simulate real occupational settings and demands.

9.2 Qualities of Performance Assessment


Qualities to consider in the use of performance assessment include (1) authenticity, (2)
context, (3) cognitive complexity, (4) in-depth content coverage, (5) examinee-structured
response, (6) credibility, (7) costs, and (8) reform.

1. Authenticity
A key quality to consider in using performance assessment is authenticity, the degree to
which the assessment reflects the knowledge and skills important to a field. Fortune and
Cromack (1995) address the authenticity of tasks in discussing the “concept of fidelity”,
which relates to the degree to which a clinical examination in licensure requires the same
knowledge and skills as the task requires on the job. The quality of authenticity is evidenced
in the objective structured clinical examination, OSCE, in which interns interact with patients
to obtain a history. In this assessment, the medical interns demonstrate skills they will use in
the clinical setting. Authenticity was also evidenced in the research study that investigated
students’ development of writing skills. Such students demonstrated their writing skills
consistent with how the skill would be applied in life.

2. Context
Related to authenticity is the quality of context. In a performance assessment, context frames
the design of tasks to assess complex skills within the real-world situations in which the skills
will be applied. Resnick and Resnick (1992) criticize assessments that assume that a complex

155
skill is fixed and will manifest in the same way, at any time, across contexts. They argue that
complex skills should be assessed in the same context in which they will be used. In
education, Baron (1991) suggests that performance tasks be presented in real-world contexts
so that students will learn that their skills and knowledge are valuable beyond the classroom.

3. Cognitive Complexity
Probably one of the most evident qualities of performance assessments is their cognitive
complexity. Performance assessments can be used to gauge examinees’ use of higher-order
cognitive strategies such as structuring a problem or assessment task, formulating a plan to
address the problem, applying information, constructing responses, and explaining the
process through which they develop their answers. Higher-order cognition was evident when
students in the science assessment categorized (classified) organisms by a key physical
characteristic.

4. In-Depth Coverage
Another quality of performance assessment is the in-depth content coverage of knowledge
and skills (Messick, 1996). In the United States National Board for Professional Teaching
Standards (NBPTS, an organization that promotes excellence in education) portfolio
component, teachers complete in-depth analyses of videotaped interactions with students;
review samples of the students’ work; and write commentaries describing, analyzing, and
reflecting on the entries. The in-depth coverage associated with performance assessment will
dictate relatively few tasks being used as compared with assessments that use multiple-choice
or a mix of multiple-choice and performance tasks. If an assessment is composed of a small
number of tasks, such as essays for a bar examination, then scores may not generalize to a
broader domain (Lane & Stone, 2006). Such may be the reason that credentialing programs
that employ performance assessments also often have a multiple-choice component to extend
the breadth of coverage.

5. Examinee - Structured Response


A hallmark of a performance assessment is that it requires examinees to construct a response
rather than choose an answer from a set of options as in selected-response formats. This
quality is evident in all of the examples we have presented. In the South Carolina Arts
Assessment Program (SCAAP), a state level art assessment program, the students produce
drawings for the visual arts assessment and sing for the music assessment. In the task from
the US National Assessment of Educational Progress (NAEP), an assessment that measures
156
what U.S. students know and can do in various subjects across the nation, states, and in some
urban districts, history assessment students write about differences in beliefs about ownership
between White Americans and Native Americans. In addition, in the Oral Clinical
Examination, candidates for diplomate status in pediatric dentistry discuss the medical issues
relevant to a case study and are questioned by examiners.

6. Credibility
Credibility contributes to the use of performance-based assessments in the certification
process for the National Board for Professional Teaching Standards (NBPTS, 2006a). The
certification process eschews use of any multiple-choice items because this response format
lacked credibility with the governing board. Another example of the issue of credibility
comes from state testing programs’ replacement of multiple-choice measures of language
expression with writing samples (Stiggins, 1987b). Stiggins notes that this change occurred
because language arts educators considered writing samples to be the most valid method for
assessing composition skills and their professional organization, the National Council of
Teachers of English, demanded the most valid form of assessment.

7. Costs
Costs are another consideration in the use of performance assessments (Resnick & Resnick,
1992; Ryan, 2006) because the expenses involved in the administration and scoring of
performance assessments can be considerable. Costs include both (1) the expense of hiring
someone to observe an examinee’s performance and/or to score an examinee’s products and
(2) the time required to assess using the performance assessment format. For example, the fee
of the United States Medical Licensing Examination (USMLE), a three-part licensure test, for
the performance-based test of Clinical Skills was approximately $1,000 in 2008, whereas the
multiple-choice-based test of Clinical Knowledge was approximately $500 (National Board
of Medical Examiners [NBME], 2008). In terms of the time required to assess the relevant
skills, the Clinical Knowledge test in the USMLE contains 370 multiple-choice items
completed in an 8-hour period. In contrast, in the Clinical Skills section of the test, examinees
complete 12 tasks in an 8-hour period. These costs are not unique to the USMLE. The fee for
the NBPTS certification process in which a teacher develops a portfolio and writes essays
was $2,500 in 2008 (NBPTS, 2008). Teachers spend up to a year preparing their portfolios
and a day at an assessment center, where they demonstrate their content knowledge in
response to six exercises with 30 minutes to respond to each. Thus performance tasks should

157
be used for assessing the complex knowledge and skills required in a content area or
professional field.

8. Reform
Performance assessment has also been used to drive reform. Resnick and Resnick (1992)
summarize this aspect of assessment in stating, you get what you assess. You do not get what
you do not assess. Build assessments toward which you want educators to teach. So far as
assessment is a key process and an integral part of education, educational reform should
address assessment. Currently, we need competence in the work world and if we use
performance assessment it will lead to reforms that require changing the overall our
instructional process. Thus, one quality of performance assessment is its contribution to
bring school reform.

9.3 Designing Performance Assessments


Four steps have been identified in designing classroom performance assessments:
a) Specifying desired outcomes,
b) Selecting the focus of evaluation,
c) Determining the appropriate degree of realism, and
d) Selecting evaluation procedures (Eggen & Kauchak 1999)
1. Specifying Desired Outcomes
This marks the first step in any performance assessment. Clearly described skill or process
helps students understood what is required and helps the teacher design appropriate
instruction. Table 9.1 provides an example based on speech.

Table 9.1: Performance outcomes in speech

Oral Presentation
1. Stands naturally.
2. Maintains eye contact.
3. Uses gestures effectively.
4. Uses clear language.
5. Has adequate volume.
6. Speaks at an appropriate rate.
7. Topics are well organized.
8. Maintains interest of the group.

158
If you wish to get a driving license, what types of tests are you supposed to take? What
we call the tests that you may take other than the paper-and-pencil type tests?
2. Selecting the Focus of Evaluation
After deciding on the desired outcomes, teachers must decide whether assessment will focus
on processes or products. Table 9.2 provides an example describing both processes and
products as components of performance assessment.

Table 9.2: Processes and Products as Components of performance


Content Area Product Process
Math Correct answer Problem solving steps leading to the
correct solution
Music Performance of work on an Correct fingering and breathing that
instrument produces the performance
English Essay, term paper, or Preparation of drafts and thought
Composition composition processes that produce the product
Word Processing Letter or copy of final draft Proper stroking or techniques for
presenting the paper
Science Explanation for the outcomes Thought processes involved in
of a demonstration preparing the explanation

3. Determining the Appropriate Degree of Realism.


This involves determining the degree of closeness of the performance task to reality. Since
the goal of using performance assessments is determining how prepared students are to work
in the real life condition, using performance assessments that use the reality itself is best.
However, time, expense, and safety may prevent the "real thing." For example, in driver
education the goal is to produce individuals who can drive safely. However, putting students
in heavy traffic to assess how safely they can drive is both unrealistic and unsafe. Figure 9.3
illustrates options ranging from low realism to high realism to assess driver, education skills.

Low realism High realism

Student responds to Student uses Student drives on Student drives in all


written cases simulator quiet rural roads of traffic, including
rush out

Figure 9.3: Continuum of realism on performance tasks

159
4. Selecting Evaluation Procedures.
The final step in designing performance assessments is to select (construct) evaluation
procedures that are reliable and valid. Having an evaluation procedure with well-defined
criteria increases both reliability and validity. Effective criteria are believed to have four
elements:
1. One or more dimensions that serve as a basis for assessing student performance
2. A description of each dimension
3. A scale of values on which each dimension is rated
4. Definition of each value on the scale
In performance assessment the assessor may evaluate the procedures while the examinee is
performing the action, the final products or both.

a) Products
One product form of performance assessment is the ubiquitous essay. Essays are a form of
extended-response that require the examinee to write a description, analysis, or summary in
one or more paragraphs (Feuer & Fulton, 1993). Other extended-response products include
term papers, laboratory reports, drawings, and dance performances. Products also include
short, constructed-response items that require an examinee to briefly answer questions about
reading passages, solve a mathematics problem, or complete a graph or diagram.

b) Procedures (Performances)
Performances include both oral assessments and demonstrations. In these instances, the
assessment focuses on process. An example of an oral assessment occurs when a teacher
assesses a student’s oral reading fluency and speaking skills.

c) Performances and Products


In some instances, performance assessments involve a combination of performances and
products. For example, experiments require students to engage in the scientific process by
developing hypotheses, planning and executing experiments, and summarizing findings
(Feuer & Fulton, 1993). The focus of the assessment can be the execution of the experiment
(i.e., a demonstration), the laboratory report that describes the results (i.e., a product), or both.
Similarly, a writing portfolio might be conceptualized as a product if the entries only reflect
final products; however, including drafts that provide information about the writing process
would change the focus of the portfolio to include process and product.

160
9.4 Scoring, Interpreting and Reporting Results of Performance
Performance assessments can be scored in a number of ways. Most commonly used ways of
scoring are checklists and rating scales. Checklists are written descriptions of dimensions that
must be present in an acceptable performance. In checklists, lists of the desired performance
are listed and the scorer simply checks off for their presence.

Checklists do not provide much information about performance especially if the performance
could be observed at different levels or degrees of quality. They are only useful when a
criterion can be translated in the Yes – No answer. Table 9.4 is a checklist prepared for
evaluating experimental techniques. In some conditions, checklists might be misleading and
fail to give full information about performance. For example, if one of your criteria to
evaluate oral presentation is "maintains eye contact" checking off using checklist will not be
proper. Even in a poor oral presentation a speaker might make eye contact with the audience
at the beginning and end of the presentation. This time, it is most likely that we would check
the behavior as observed. Practically speaking the behavior is observed at minimal level of
acceptance. Such problems are minimized if rating scales are used.
Table 9.4: Checklist for Evaluating Experimental Studies
DIRECTIONS: Place a check under yes or no in front of each Alternative
step performed by the student. Yes No
1. Writes problem at the top of each step performed.
2. States hypothesis(es).
3. Specifies values for controlled variables.
4. Presents data in a chart.
5. Makes at least two measurements of each value of the
dependent variable.
6. Draws conclusions consistent with the data in the chart.

Rating scales are written descriptions of evaluative dimensions and scales of values on which
each dimension is rated (Eggen & Kauchak, 1999). Rating scales can take descriptive,
graphic or numerical forms. Unlike checklists, rating scales provide relatively detailed
feedback to students and allow for better chance for discussion between the teacher and
student.
Interpretation of any performance or product tends usually to be subjective. This makes
interpretation of performance assessment too risky or unreliable for some users. Therefore,
those who use performance assessments have to strive to make performance assessments as
161
objective as possible (Elliot et al., 2000). Eliot et al. (2000) suggest tips to make performance
assessments relatively objective. Table 9.5 lists the tips.
Table 9.5: Three types rating scales for evaluating oral presentations
1. Numerical Rating Scales
Directions: Indicate how often the pupil performs each of these behaviors while giving an oral
presentation. For each behavior circle 1 if the pupil always performs the behavior, 2 if the pupil
usually performs the behavior, 3 if the pupil seldom performs the behavior, and 4 if the pupil never
performs the behavior.
Physical Expression
A. Stands straight and faces audience.
1 2 3 4
B. Changes facial expression with change in the tone of the presentation.
1 2 3 4
2. Graphic Rating Scale

Directions: Place an X on the line which shows how often the pupil did each of the behaviors listed
while giving an oral presentation.
Physical Expression
A. Stands straight and faces audience.

Always usually seldom never

B. Changes facial expression with change in the tone of the presentation.

Always usually seldom never


3. Descriptive Rating Scale
Directions: Place an X above the statement which best describes the pupil’s performance on each
behavior.
Physical Expression
A. Stands straight and faces audience.
Stands straight, always Weaves, fidgets, eyes constant distracting
looks at audience roam from audience movements, no eye
to ceiling contact with audience

B. Changes facial expression with change in the tone of the presentation.


matches facial facial expressions usually no match between tone
expressions to content appropriate, occasional and facial expression:
and emphasis lack of expression expression distracts

162
Table 9.5: Tips on Performance Assessment

Enhancing the Objectivity of Results


PRINCIPLE
Maximizing the objectivity of decisions made from an assessment is valued by learners and other
test result consumers because it enhances the sense of fairness and clarity of the results.
STRATEGIES
1. Clearly state the purpose for an assessment
2. Be explicit about the assessment target and the key elements of a sound performance.
3. Describe the key elements as scoring criteria.
4. At the start of the process, and throughout, share the scoring criteria with students in terms they
understand.
5. Practice using the scoring criteria and then monitor the consistent use of the criteria with actual
student products of performance
6. Double-check (using other raters) to ensure bias does not enter into the assessment process

You give your students every time homework, assignment, and different activities in the
semester. If you or the student keeps all the documents until the end of the semester and
evaluate the progress of the student, what you call it?

a) Assignments
Assignment refers to tasks assigned to students by their teachers to be completed outside of class.
Common homework assignments may include a quantity or period of reading to be performed,
writing or typing to be completed, problems to be solved, a school project to be built or other
skills to be practiced.

Assignment as Assessment Device;


o Concept understanding
o Content organization
o Content presentation
o Analytic ability
o Synthesis of material
o Formulation of ideas
o Use of arguments
o Content accuracy
o Content quality (originality)

163
o Clear conclusion
o Overall clarity
o Grammar and Spelling
o Footnotes and Bibliography

b) Portfolio Assessment
Portfolios refer to a file or folder containing a variety of information that documents a
student’s experiences and accomplishments. That is, portfolios are collections of work that
are reviewed against present criteria to judge learning progress (Eggen and Kauchak, 1999).
In portfolios, learners have the chance to collect their products and judge from the collected
products that which they feel can indicate to others they are learning. Because they are
cumulative, connected and occur over a period of time, they can provide a motion picture of
learning progress as opposed to tests or quizzes that are usually disconnected and snapshot.
Portfolios may include collections of assignments, video recordings and works done by the
learner himself/herself.

According to Eggen and Kauchak (1999), two features distinguish portfolios from other
forms of assessment.
1. Portfolios collect work samples from other forms of assessment
2. Portfolios involve students in design, collection, and evaluation

Since they involve students in the assessment of their own learning progress, portfolios
promote self-regulation (self-controlling). According to Eggen & Kauchak (1999) two points
may be raised at this point. First, who decides what should be included in the portfolio?
Second, on what criteria will the students' work be evaluated?

In response to the first question, it is suggested that students be involved in selecting what to
include in their portfolios. This provides students the opportunity to reflect on their own
learning progress. Eggen & Kauchak (1999) suggest the following guidelines that can help
make portfolios effective learning tools.
• Provide examples of portfolios when first introducing them to students.
• Involve students in the selection and evaluation of their work.
• Require that students provide an overview of each portfolio, a rationale for the
inclusion of individual works, criteria they used to evaluate an overall summary of progress.
• Provide students with frequent and detailed feedback about their decisions.

164
The following is a list of samples of portfolio in different content areas.

Table 9.6: Portfolio samples in different content areas


Content Area Example
Elementary Math Homework, quizzes, tests, and projects completed overtime.
Writing Drafts of narrative, descriptive, and persuasive essays in various
stages of development. Samples of poetry
Art Projects over the course of the year collected to show growth in
an area like perspective or in a medium like painting
Science Lab reports, projects, classroom notes, quizzes and tests
compiled to provide an overview of learning progress

Portfolios are purposeful, organized, systematic collections of student work that tell the story
of a student's efforts, progress, and achievement in specific areas. The student participates in
the selection of portfolio content, the development of guidelines for selection, and the
definition of criteria for judging merit. Portfolio assessment is a joint process for instructor
and student.

Portfolio assessment emphasizes evaluation of students' progress, processes, and performance


over time. There are two basic types of portfolios: A process portfolio serves the purpose of
classroom-level assessment on the part of both the instructor and the student. It most often
reflects formative assessment, although it may be assigned a grade at the end of the semester
or academic year. It may also include summative types of assignments that were awarded
grades.

A product portfolio is more summative in nature. It is intended for a major evaluation of


some sort and is often accompanied by an oral presentation of its contents. For example, it
may be used as a evaluation tool for graduation from a program or for the purpose of seeking
employment. In both types of portfolios, emphasis is placed on including a variety of tasks
that elicit spontaneous as well as planned language performance for a variety of purposes and
audiences, using rubrics to assess performance, and demonstrating reflection about learning,
including goal setting and self and peer assessment.

Advantages of Portfolio: -
Can be used to view learning and development longitudinally

165
• Multiple components of the curriculum can be assessed (e.g. writing, critical thinking,
technology skills)
• Samples are more likely than test results to reflect student ability when planning, input
from others, and similar opportunities common to more work settings are available.
• May be economical in terms of student time and effort if no separate assessment
administration time is required
• Results are more likely to be meaningful at all levels (student, class, program, institution)
and can be used for diagnostic and prescriptive purposes as well
• Avoids or minimizes test anxiety and other one-shot measurement problems
• Increases power of maximum performance measures over more artificial or restrictive
speed measures on test or in-class sample.
• Increases student participation (selection, revision, and evaluation) in the

Limitations of Portfolio
• Time consuming and challenging to evaluate
• Contents may vary widely among students
• Students may fail to remember to collect items
• Transfer students may not be in the position to provide complete portfolio
• Time intensive to convert to meaningful data
• Management of the collection and evaluation process, including the establishment of
reliable and valid grading criteria, is likely to be challenging.
• Security concerns may arise as to whether submitted samples are the students’ own work
or adhere to other measurement criteria.
• Must consider whether and how graduates will be allowed to continue accessing to their
portfolios

Guidelines for Producing Improved Portfolio Assessment Tools


• Clear expectations about the purpose and collection responsibilities will help students
succeed in using the portfolio method. If the faculty wants student portfolios to represent
student development over time, they will need to be scrupulous about setting forth the
performance demand of collecting and examining works throughout the student's career.
• The success of the portfolio may be enhanced when students reflect on how all the pieces
work together to express their learning or meet department criteria.

166
• Develop specific, measurable criteria for observing and appraising
• Develop rubrics for greater consistency among raters
• Have more than one rater for each portfolio
• Provide training for raters and check inter-rater reliability
• Recognize that portfolios in which samples are selected by the student probably represent
their best work rather than typical work
• Cross-validate portfolios with more controlled student assessments for increased validity.
• The conceptual framework for a student learning portfolio needs to be based on (1)
agreed upon student learning goals and objectives, (2) consideration of what faculty want
to learn from the portfolios, and (3) consideration of what students should learn from the
process.

SUMMARY
• Classroom assessment includes gathering data about students' learning using a range of
data gathering tools and making judgment.
• Authentic assessment is a direct examination of student performance on tasks that are
relevant to life outside of the school.
• Authentic assessments such as performance assessment and portfolios require students to
demonstrate behavior in situation that are closer to reality.
• In many cases, teachers’ expectation on the affective domain is not measured as assessment
of these behaviours is inherently subjective and difficult, lack skills for assessment of
these behaviours; promotions, certifications, graduations, failure decisions etc in the
school system are based entirely on cognitive and psychomotor achievements at the
neglect of affective domain etc. Hence, techniques that may fill these gaps were discussed
after categorizing them into observational, self-appraisal and peer evaluation techniques.
• There are four assumptions that govern the assessment process:
• tests are sample of behavior and serve as aids to decision making
• a primary reason to conduct an assessment is to improve instructional
activities for a student.
• the person conducting the assessment is properly trained
• all forms of assessment contain error.
• A performance test can require four types of accomplishments from learners: products,
complex cognitive processes, observable performance, and attitudes and social skills.
These performances can be scored with checklists and rating scales.

167
• Checklists contain lists of behaviors, traits, or characteristics, which can be scored as either
present or absent.
• Rating scales assign numbers to categories representing different degrees of performance.

SELF TEST CHEKLISTS


It is time to check your understanding of assessing classroom learning. Read each of the
following questions and answer them by checking in one of the boxes under alternatives 'Yes' or 'No'
Item Yes No
Can you list the types of assessment tools teachers can use in their classrooms?
Can you distinguish between traditional and authentic assessment tools?
Can you construct authentic assessment tools?
Can you define performance assessment?
Can you list different types of performance assessment?
Can you identify the differences between paper-and-pencil test and
performance assessment?

Is there any box that you marked under 'No'? If there is any, please go back to your text and
read about it.

SELF TEST EXCERCISES


1. Why is it desirable to use a variety of evaluation techniques when evaluating learning
outcomes?
2. What type of materials can be included in the portfolio of students in relation to your
subject?
3. In what specific instances can these assessment strategies (rating scales, checklists and
rubrics) used in your area of study? Think of specific examples and share your ideas with
your colleague.
4. You want to assess the quality of poem written by one of your friends. How do you go
about assessing the quality? Should you use paper-and pencil (i.e., written tests)? If not
why not?

168
REFERENCES
Airasian, P.W. (1996). Assessment in Classroom. New York: McGraw-Hill, Book Com.
Borich, G.D, and Tombrie, (1995). Educational Psychology. New York: HarperCollins
College publishers.
Borich, G.D. (1988). Effective Teaching Psychology. New York: Macmillan Publishing
Com.
Carey, L.M. (1994). Measuring and Evaluating School Learning. 2nd ed. Boston: Allyn and
Bacon.
Ebel, R.L, and Frisbie, D. A. (1991). Essentials of Educational Measurement. Englewood
Cliffs, CA: Prentice Hall.
Eggen, P. and Kauchak, D. (1999). Educational Psychology: Windows on Classrooms. 4th ed.
Upper Saddle River: Prentice-Hall, inc.
Elliot, S. et al. (2000). Educational Psychology: Effective Teaching, Effective Learning. 3rd
ed. Boston: McGraw- Hill Companies, Inc.
Gronlund, E. N. (1971). Measurement and Evaluation. 2nd ed. New York: Collier-
Macmillan limited.
Mehrens, W.A., and Lehmann, I. J. (1984). Measurement and Evaluation. New York: Holt
Rinehart and Winston.
Nitko, A.J. (1983). Educational Tests and Measurement: An Introduction. New York:
Harcourt brace Jovanovich, Inc.
Nitko, A.J (1996). Educational Assessment of Students. 2nd ed. Englewood Cliffs, CA: Merrill
Prentice Hall.
Oosterhoff, A. (1994). Classroom Application of Educational Measurement. New York:
Macmillan Publishing company.
Payne, D.A. (1997). Applied Educational Assessment. Belmont, CA: Wadsworth Publishing
Company.
Payne, D.A. (1992). Measuring and Evaluating Educational Outcomes. New York:
Macmillan Publishing Company.

You might also like