Assessment and evaluation of learning
1
CHAPTER ONE
Assessment: Concept, Purpose, and Principles
1.1 INTRODUCTION
This module is designed to equip you with the basic knowledge
and practical skills required to assess students’ learning.
1.2. Concepts
You might have come across the concepts test, measurement,
assessment, & evaluation.
A. Test in educational context is meant to the presentation of a
standard set of questions to be answered by students.
Involves a series of questions with different item types. It
is given formally while a course is on progress.
2
• It is one instrument or tool that is used to determine students’
ability, skills or performance on specific content.
• Its purpose is to assess learning progress and identify if there is
learning difficulties.
Quiz
• A short and informal test.
• It is given at class hour just at the beginning or at the end.
3
Conti…
Examination
• A more comprehensive form of a test often used at the end of
a term, semester or year.
• Covers a large area of content.
• It is given at the end of a course or semester.
• Its main purpose is to assign grade.
• The number of items included is large.
4
B. Measurement
A systematic description of students performance in terms of
numbers.
The process of assigning number or quantifying to represent
an individual's performance is called measurement.
Measurement is the process by which the attributes of a
person are measured and described in numbers.
It is a quantitative description of the behavior or
performance of students.
E.g. Haritu solved correctly 15 of the 20 quadratic
equations.
5
C. Assessment
Assessment is a much more comprehensive and inclusive concept
than testing and measurement. It includes the full range of
procedures (observations, rating of performances, paper and
pencil tests, etc) used to gain information about students’ learning.
Educational assessment is the process of collecting information
with the purpose of making decisions about students learning
progress.
We may collect information using various instruments including
teacher made tests, observations of students including their
written works and answers to questions in class, checklists,
questionnaires and interviews.
6
D. Evaluation
Evaluation: refers to the process of judging the quality of student
learning on the basis of established performance standards and
assigning a value to represent the worthiness or quality of that
learning or performance.
It is concerned with determining how well students have learned.
Evaluation is based on assessment that provides evidence of
student achievement at strategic times throughout the grade/course,
often at the end of a period of learning.
E.g. 1. Tonja's work was neat.
2. Madebo is making a good progress in
mathematics.
7
Evaluation refers to a systematic process of determining the
extent to which instructional objectives are achieved by
pupils.
May include either quantitative or qualitative descriptions of
pupil’s performance or both.
Evaluation always involves value judgments concerning the
desirability of the results.
Evaluation = Quantitative descriptions (measurement) and/or
Qualitative descriptions (non-measurement) Plus Value
judgments
8
1.3. Importance and Purposes of Assessment
Assessment for improved student learning requires a range of
assessment practices to be used with the following purposes:
Assessment of Learning: this kind of assessment is usually
summative in nature, which is done at the end of a learning
task.
It is designed to provide evidence for teachers to make
decisions/judgments about students’ achievement against set
goals and standards.
It also helps to provide evidence of students’ achievement to
parents, administrators, educators and students themselves.
9
Assessment for Learning: this type of assessment occurs while
teaching and learning is on progress, rather than at the end.
In assessment for learning, teachers use assessment evidences to
monitor students learning progress and inform their teaching.
This form of assessment is designed to provide diagnostic
information to teachers about students’ prior knowledge and
formative information about the effects of their instruction on
student learning.
It also provides students with important information about their
learning and the effectiveness of the learning strategies they are
using. It is the most important form of assessment in regard to
student learning. 10
Assessment as learning:
Assessment as learning involves students in their own
continuous self-assessment and is designed to help students
become more self-directed learners.
Self-assessment involves helping students set their own
learning goals, monitor progress toward achieving these
goals, and make adjustments in learning strategies as
required.
Enables students to identify and reflect on elements of their own
learning.
11
With regard to teaching, assessment provides information about
the attainment of objectives, the effectiveness of teaching
methods and learning materials.
Overall, assessment serves the following main purposes.
1. Assessment is used to inform and guide teaching and
learning:
It provides teachers with information about what students
know and can do.
To plan effective instruction, teachers also need to know
what the student misunderstands.
12
Conti…
2. Assessment is used to help students set learning goals:
Students need frequent opportunities to reflect on where they
are in their learning, where they need to go and what needs to
be done to achieve their learning goals.
3.Assessment is used to assign report card grades:
Grade reports provide parents, employers, schools, and other
stakeholders including the government, post-secondary
institutions and employers with summary information about
student learning.
13
Conti…
4. Assessment is used to motivate students:
Research has shown that students will be confident and
motivated when they experience progress and achievement,
rather than the failure and defeat associated with being
compared to more successful peers.
14
1.4 The Role of Educational Objectives in Assessment
Educational objectives which are commonly known as learning
outcomes play a key role in both the instructional process and the
assessment process.
They are desirable changes in behavior or outcome statements that capture
specifically what knowledge, skills, attitudes learners should be able to
exhibit following the instructional process.
They serve as guides for both teaching and learning,
communicate the intent of instruction to others, and provide
guidelines for assessing students learning.
15
Conti…
Educational objectives or learning outcomes are stated what
the students are expected to be able to do at the end of the
instruction.
For instance, after teaching them on how to solve quadratic
equations, we might expect students to have the skill of
solving any quadratic equation.
A learning outcome indicates the kind of performance
students are expected to exhibit as a result of the instruction.
16
Classification of Educational Objectives
Bloom and his associates have developed a taxonomy of
educational objectives, which provides a practical framework
within which educational objectives could be organized and
measured.
In this taxonomy Bloom et al (1956) divided educational
objectives into three domains. These are cognitive domain,
affective domain and psychomotor domain.
17
Conti…
Cognitive domain: This involves those objectives that deal with
the development of intellectual abilities and skills. These have to
do with the mental abilities of the brain.
18
Levels of cognitive domain
Level Description
Knowledge recognition or recall of previous learned information
Comprehension is all about internalization of knowledge
Application use of abstractions in a concrete situation
Analysis the breaking down of a learnt material into parts, ideas and
devices for clearer understanding
Synthesis Combining components to form a new whole
Evaluation making a quantitative or qualitative judgment about a piece
of communication, a procedure, a method, a proposal, a plan
etc 19
Affective Domain: Affective domain has to do with feelings and
emotions. It is concerned with interests, attitudes, appreciation,
emotional biases and values.
Level Description
receiving Freely attending to stimuli
responding Voluntarily reaching to stimuli
valuing Forming an attitude toward a stimulus
organization Bringing together different values and building a consistent value
system by resolving any possible conflicts between them
characterization Behaving consistently with an internally developed, stable value
system
20
Conti…
Psychomotor domain: The psychomotor domain has to do
with motor skills or abilities.
It deals with such activities which involve the use of the hand
or the whole of the body. Can you think of such abilities or
skills. Consider the skills in running, walking, swimming,
jumping, eating, playing, throwing, etc.
21
Levels of the psycho motor domain
Level Description
Imitation Observing and patterning behavior after someone else
Manipulation Being able to perform certain actions by following written/oral
instructions and practicing
precision Refining, becoming more exact. Few errors are apparent
Articulation Coordinating a series of actions, achieving harmony and internal
consistent
Naturalization Having high level performance become natural, without needing to think
much about it.
22
1.5 Assessment and Teacher Professional Competence in Ethiopia
A teacher should have some basic competencies on classroom assessment
so as to be able to effectively assess his/her students learning.
The seven standards articulating teacher competence in the educational
assessment of students are:
1. Teachers should be skilled in choosing assessment options appropriate
for instructional decisions. In particular, they should be familiar with
criteria for evaluating and selecting assessment methods in light of
instructional plans.
2. Teachers should be skilled in developing assessment methods appropriate
for instructional decisions. Assessment tools may be accurate and fair
(valid) or invalid. Teachers must be able to determine the quality of the
assessment tools they develop.
23
3. Teachers should be skilled in administering, scoring, and
interpreting the results of assessment methods. It is not enough
that teachers are able to select and develop good assessment
methods; they must also be able to apply them properly.
4. Teachers should be skilled in using assessment results when
making decisions about individual students, planning teaching,
developing curriculum, and school improvement.
5. Teachers should be skilled in developing valid student grading
procedures that use pupil assessments. Grading students is an
important part of professional practice for teachers.
24
6. Teachers should be skilled in communicating assessment
results to students, parents, and other educators.
7. Teachers must be well-versed in their own ethical and legal
responsibilities in assessment.
25
Unit Two: Assessment Strategies, Methods, and Tools
2.1 Introduction
2.2. Types and approaches to assessment
There are different approaches in conducting assessment in the
classroom.
Here five pairs of assessment typologies: namely, formal vs.
informal, criterion referenced vs. norm referenced, formative vs.
summative assessments, divergent vs. convergent, process vs.
product assessment.
26
2.2.1. Summative versus Formative assessment
A- Formative assessment:- is conducted to monitor the
instructional process, to determine whether learning is taking
place as planned.
It occurs during instruction.
The major function of formative assessment in the classroom
is to provide continuous feedback to both students and teacher
concerning learning success and failure or how things are
going in instructional process and to enhance students
learning.
Information is obtained through, teacher observation,
classroom oral questioning, homework assignments, classroom
assignments, and quizzes, diagnostic test and lab reports etc.. 27
B- Summative assessment:-Summative assessment typically
comes at the end of a course (or unit) of instruction.
It evaluates the quality of students’ learning and assigns a mark
to that students’ work based on how effectively learners have
addressed the performance standards and criteria.
Grading to determine if the program was successful.
To certify students and improve the curriculum
e.g. Final exams, national examinations, qualifying tests.
28
2.2.2 Formal vs. Informal Assessment
I. Formal Assessment: Formal assessments are where the
students are aware that the task they are doing is for
assessment purposes.
They are frequently used in summative assessments. This
usually implies a written document, such as a test, quiz, or
paper.
A formal assessment is given a numerical score or grade based
on student performance.
29
II. Informal Assessment: "Informal" is used to indicate
techniques that can easily be incorporated into classroom
routines and learning activities.
In the case of informal assessment the students may be un
aware that the task they are doing is for assessment purposes.
Informal assessment techniques can be used at anytime
without interfering with instructional time.
Their results are indicative of the student's performance on
the skill or subject of interest.
Thus they are more frequently used in formative assessments.
30
2.2.3.Criterion-referenced vs Norm-referenced assessment
A-Criterion-referenced assessment:- is concerned with a way of
interpreting a test score which compares an individual’s
performance to the established standard (criteria) of performance.
The criterion-referenced interpretations enable us to describe
what an individual can do without reference to the performance
of others.
Criterion referenced assessments help to eliminate competition
and may improve cooperation.
31
B- Norm-referenced assessment:- refers to a form of interpreting a
test score that employees the practice of comparing a student’s
performance to the class performance or some external average
performance such as local, state or national averages.
The focus of attention in this type of assessment is on how well
the student has done on a test in comparison with other students.
For example, students’ results in grade 8 national exams in our
country are determined based on their relative standing in
comparison to all other students who have taken the exam.
32
Conti…
Which one of the following is an example of CRA or NRA ?
1. Hundito computes simple linear equations.
2. Abebe computes simple linear equation better than
75% of the students in the class.
3. Shallamo can spell words better than half of his
classmates in the language class of elementary school.
4. Huluagerish can convert temperature from the Celsius
to the Fahrenheit scale.
33
2.2.4 Divergent versus Convergent Assessment
Divergent assessments are those for which a range of answers or
solutions might be considered correct.
For example, a Civics teacher might ask his/her students to compare
presidential and parliamentary forms of government as preferable forms
of government for a country.
A student might favor a presidential form of government by providing
sound arguments and valid examples.
Another student also might come up with still convincing ideas
favoring parliamentary form of government.
In both cases the answers are different but convincingly correct. So in
divergent assessments there might not be one single answer.
Divergent assessment tools include essay tests, and solutions to the
workout problems. 34
A convergent assessment are those which have only one
correct response that the students is trying to reach.
They are generally easier to mark.
They tend to be quicker to deliver and give more specific and
directed feedback to individuals.
It can also provide wide curriculum coverage.
Objective test items are the best example and demonstrate the
value of this approach in assessing knowledge.
35
2.2.5 Process versus Product Assessment
Process assessment focuses on the steps or procedures underlying a
particular ability or task, i.e., the cognitive steps in performing a mathematical
operation or the procedure involved in analyzing a blood sample.
Because it provides more detailed information, process assessment is most
useful when a student is learning a new skill and for providing formative
feedback to assist in improving performance.
For example, a Biology teacher teaching his students how to identify a
microorganism using a microscope might give them a task to do an activity.
Here his focus is not only on whether students are able to identify the
microorganism.
He should also check on whether students have followed the proper
procedures to reach the conclusion.
36
Product assessment focuses on evaluating the result or
outcome of a process.
Using the above examples, we would focus on the answer to
the math computation or the accuracy of the blood test
results.
Product assessment is most appropriate for documenting
proficiency or competency in a given skill.
A multiple choice test that a Mathematics teacher gives to his
students, for example, is a product assessment.
There is no way he/she will check whether students have
followed the proper procedures to get the correct answer.
37
2.5. Selecting and developing assessment methods and tools
2.5.1. Selecting appropriate assessment methods and tools
When selecting and constructing assessment tools we
consider the following questions
Does the assessment adequately evaluate academic
performance relevant to the desired outcome?
Does the assessment enable students to different learning
styles?
38
2.5.2. Planning Tests
Tests are one of the most important and commonly used
assessment instruments used in education.
The development of valid, reliable and usable questions involves
proper planning.
Planning helps to ensure that the test covers the pre-specified
instructional objectives and the subject matter (content) under
consideration.
Hence, planning classroom test involves identifying the
instructional objectives earlier stated and the subject matter
(content) covered during the teaching/learning process.
39
Conti…
The following serves as guide in planning a classroom test:
• Determine the purpose of the test;
• Describe the instructional objectives and content to be measured.
• Determine the relative emphasis to be given to each learning
outcome;
• Select the most appropriate item formats (essay or objective);
• Develop the test blue print to guide the test construction;
• Decide on the pattern of scoring and the interpretation of result;
• Decide on the length and duration of the test, and
• Assemble the items into a test, prepare direction and administer
the test.
40
Table of Specification
A table of specification is a two-way table that matches the
objectives and content you have taught with the level at which
you expect your students to perform.
It is also known as Test blue print, framework or Engineering
design plan to test developers.
It is usually a two- way chart or grid
– Thinking levels along the horizontal axis
– Contents along the vertical axis
Can be designed
a. Across item types and
b. Blooms taxonomy
Purpose of the Table of specification
-Identify the learning outcomes of the subject taught
-Determine level of thinking required by competencies
-Proportionate time given and number of questions
-Align content, method and assessment
-Increase the quality of assessment items.
Test Specification
Content Specific Objectives
Knowledg Comprehensi Application Analysis Synthesis Evaluation Total
e on
Cha 1 6 1 1 1 - - 9
Cha 2 4 2 2 - 1 - 9
Cha 3 2 2 1 2 1 8
Cha 4 2 3 2 4 - 1 12
Cha 5 3 1 3 1 1 1 10
Total 17 9 9 8 3 2 48
Cont….
Content True /false Match Short ans. MC Essay
Cha 1
Cha 2
Cha 3
Cha 4
Cha 5
Total
• The rows show the content areas from which the test is to be
sampled; and the columns indicate the level of thinking
students are required to demonstrate in each of the content
areas.
• Similarly, content areas on which you have spent more
instructional time should be allotted more test items.
45
2.5.3 Constructing Classroom Tests
Constructing Objective Test Items
There are various types of objective test items.
These can be classified into those that require the student to
supply the answer (supply type items) and those that require the
student to select the answer from a given set of alternatives
(selection type items).
Supply type items include completion items and short answer
questions.
Selection type test items include True/False, multiple choice and
matching.
46
A- True/False Test Items
Advantages of True false /Alternative Response Items
The main advantage of true/false items is that they do not
require the student much time for answering.
This allows a teacher to cover a wide range of content by
using a large number of such items.
In addition, true/false test items can be scored quickly,
reliably, and objectively by anybody using an answer key.
47
Conti…
Dis advantages of True false /Alternative Response Items
The major disadvantage of true/false items is that when they are
used exclusively, they tend to promote memorization of factual
information: names, dates, definitions, and so on.
Some argue that another weakness of true/false items is that they
encourage students for guessing.
This is because any student who takes such type of tests does have
a 50 percent probability of getting the right answer.
Do not discriminate b/n students of varying ability as well as other
test items.
Cheating
They tend to be less discriminating. 48
Conti…
Suggestions to construct true/false items
Avoid negative statements, and never use double negatives.
Restrict single-item statements to single concepts.
E.g. An item consisting of two statements , one part correct the
other part wrong.
Use an approximately equal number of items, reflecting the two
categories tested.
Make statements representing both categories equal in length.
49
Conti…
Avoid specific determiners
• Most, all, always, sometimes, in most cases etc…
• Statements that include such absolutes as "always,"
"never,“ "all," "none," and "only" tend to be false;
• Statements with qualifiers such as "usually," "may,"
and "sometimes" tend to be true.
50
B. Matching Items
A matching item consists of two lists of words or phrases.
The test-taker must match components in one list (the premises,
presented on the left) with components in the other list (the
responses presented on the right), according to a particular kind
of association indicated in the item’s directions.
Advantages of matching items
The major advantage of matching items is its compact form,
which makes it possible to measure a large amount of related
factual material in a relatively short time.
Another advantage is its ease of construction.
51
Conti…
Limitation of matching items
They are restricted to the measurement of factual information
based on rote learning.
Difficulty of finding homogenous material that is significant
from the perspective of the learning outcomes.
Highly susceptible (expose) to the presence of irrelevant clue if
there is lack of plausible responses.
52
CONTI…
Suggestions for Constructing Matching Items
1. Employ homogeneous lists
e.g. All the items should deal about rivers
2. Include more responses than premises
3. Try to place all premises and responses for any matching item on
a single page. Disturbance created by 50 or so students shifting
the page of the test back and forth.
53
Cont.….
4. Arrange the list of responses in logical order, for example, in
alphabetical order and number in sequence.
Column “A” Column “B”
Invasion of Italy 1888
Battle of Matamma 1893
The battle of Meqdela 1912
The Battel of Adewa 1926
1933
1938
C. Short Answer/completion test Items
have two varieties:
The question variety: the item presented as a direct
question
What is the longest river in Ethiopia?
The completion variety: the item given in an incomplete
statement.
The longest river in Ethiopia is ___________.
Advantages of short answer/completion items
Relatively easy to construct
Reduces the possibility of guessing the correct answer.
Conti…
Limitation of short answer/completion items
Scoring is difficult because of multiplicity of plausible answers
Limited to measure largely specific facts because answers are
restricted to a few words, phrases or numbers.
unsuitable for assessing complex learning outcomes.
Suggestions for construction of short answer/completion
items
1. Word the item so that the required answer is both definite &
brief
• Should be written in such a way that it would have only one
correct answer.
• Example: An animal that eats the flesh of other animals is _____. Poorly stated
• An animal that eats the flesh of other animals is classified as _____. Better item
Cont.….
2. Do not take statements directly from textbooks
3. A direct question is more desirable than are incomplete
statement,
The longest river in Ethiop is ________
What is the longest river in Ethiopia?
4. Have the blanks occur near the end of the sentence
Poor: The ___________ is the smallest particle of matter
Better: the smallest particle of matter is ________
Cont.…
5. Omit key words & phrases rather than trivial details.
Poor: Columbus _______American in 1892(discovered)
Better: Columbus discovered America in
(year)______________.
6. Avoid too many blank statements because their meaning
will be lost & make pupil to guess.
Poor: __________ animals that are born _____ and________
their young’s are called _______ (mammals)
D. Multiple-Choice Items
This is the most popular type of selected-response item.
A student is first given either a question or a partially
complete statement. This part of the item is referred to as the
item’s stem. Then three or more potential answer-options are
presented. These are usually called alternatives, choices or
options.
The correct response is called the key answer, the remaining
alternatives are called distractors.
59
Anatomy of a multiple choose questions
• 2 parts:
stem – present a problem situation
alternatives, options, or choices – provide possible answers
• Stems may be in the form of a question or an incomplete
statement
• Example
There are two important variants in a multiple-choice item:
1-whether the stem consists of a direct question or an incomplete
statement, and
2-whether the student’s choice of alternatives is supposed to be a correct
answer or a best answer. The following two examples will demonstrate
their differences:
• A direct-question (best-answer) multiple-choice item
• Which of the following European countries as suffered more from the
consequences of the Second World War?
– Germany B. Britain C. France D. Russia/USSR/
• An incomplete-statement (correct-answer) multiple-choice item
• The Second World War was started in the year ________________
– 1936 B. 1939 C. 1941 D. 1945 61
Advantages of multiple choose questions
Measure varieties of learning outcomes ranging from simple
to complex.
Free from some common short comings (gaps) observed in
True- False, Matching or completion items.
The most adaptable (More flexible)—any type of subject
matter can be tested.
Can be scored easily
Small guessing
Conti…
Limitations of multiple choose questions
Difficult to construct
Takes more time to read
More space per item
Measure learning outcomes at lower level like other paper &
pencil tests.
Not adaptable to the measurement of problem solving skills,
organizing and presenting ideas.
Difficulty of getting a sufficient and plausible( Similar)
distractors.
Facilitates cheating.
Suggestions for construction of multiple choose items
Avoid negatively stated stems. Just as with the True/False items,
negatively stated stems can create confusion in students.
Each alternative must be grammatically consistent with the item’s stem.
Poor: an electric transormer can be used
A. for storing up electricity
B. to increase the voltage of alternating current
C. It converts electrical energy into mechanical energy
D. Alternative current is changed to direct current.
Better: An electric transformer can be used to
A. Store up electricity
B. Increase voltage of alterative current
C. Convert electrical energy into mech. Energy.
D. Change alternating current to direct current
64
Cont…
Make all alternatives plausible, but be sure that one of them is
indisputably the correct or best answer.
Randomly use all answer positions in approximately equal
numbers.
Never use “all of the above” and “none of the above.
Verbal associations between the stem and the correct answer
should be avoided.
The relative length of the alternatives should not provide a clue to
the answer.
All options should be homogeneous
65
Constructing Essay or Subjective test items
The distinctive feature of essay questions is that students are free
to construct, relate, and present ideas in their own words.
Learning outcomes concerned with the ability to conceptualize,
construct, organize, relate, and evaluate ideas require the
freedom of response and the originality provided by essay
questions.
Essay questions can be classified into two types – restricted-
response essay questions (structured) and extended response
essay questions.
A. Restricted-response essay questions: These types of questions
usually limit both the content and the response.
66
Conti…
Structured or restricted response type is Limit the content, & form
of responses.
e.g. explain, in not more than 200 words about the role of parenting
practice on the children’s development?
B. Non- structured or non- restricted or extended response type
provides freedom to select, organize, integrate, or evaluate ideas.
There is non- restriction.
e.g. Explain the role of parenting practice on the children’s
development?
67
Cont.….
Advantages of Essay items
Easier to prepare and administer
Induce good study habits
Students study more efficiently for essay-type examinations
than for selection
No guessing
Measure higher order learning outcomes. Application,
analysis, synthesis, & evaluation
Emphasize the integration and application of thinking and
problem solving skills
Improve writing skills
Cont.……
Disadvantages of Essay items
Low validity and reliability
Subjectivity of scoring
Contaminated by extraneous factors, such as spelling, hand
writing, neatness, length of the answer, halo effect(biased
judgment by previous impression)
Depends on the mood of examiner
The effect of first impression (if the first item is correctly
done)
Scoring the answers is a time consuming and tiresome task
Suggestions in Writing Essay items
Restrict the use of essay questions to those learning outcomes that
cannot be measured satisfactorily by objective items.
The wording of the question should be clear & unambiguous
For each question, specify the point value, an acceptable response-
length, and a recommended time allocation.
Employ more questions requiring shorter answers rather than
fewer questions requiring longer answers.
70
5.4 Constructing Performance Assessments
Performance-based assessments are needed to check whether
the desired learning outcomes are achieved up to the
expected standards.
For example, oral performance is required to assess a
student’s spoken communication skills in a certain language.
Similarly, the use of mathematics to solve meaningful
problems and to communicate solutions to others may also be
best assessed by the use of performance tasks in realistic
settings.
71
Conti…
Performance assessment is assessment based on observation and
judgment; we look at a performance or product and make a
judgment as to its quality. Examples include the following:
Complex performances such as playing a musical instrument,
carrying out the steps in a scientific experiment, speaking a
foreign language, reading aloud with fluency, or working
productively in a group. In these cases it is the doing—the
process—that is important.
Creating complex products such as a term paper, a lab report, or a
work of art.
72
Conti…
Performance assessments typically focus on demonstration of skills.
Examples include:
• Constructed response written examination (essays, sentence-
completion, products);
• Oral examination or presentations
• Project (individual or team)
• Written case study
• Portfolio
• Work product
• Student peer or self-evaluations
• Performance assessments can be administered to individual students or
groups of students.
73
2.6 Arrangement of test items
There are various methods of grouping items in an achievement
test depending on their purposes.
For most purposes the items can be arranged by a systematic
consideration of:
The type of items used
The learning outcomes measured
The difficulty of the items, and
The subject matter measured
74
Conti…
First, the items should be arranged in sections by item type.
That is all True-false items should be grouped together, then
matching items, then all short answer or completion items,
and then all multiple choice items.
Extended-response essay questions and performance tasks
usually take a lot of time that they would be administered
alone.
If combined with some of the other types of items and tasks,
the extended response tasks should come last.
Arranging the sections of a test in this order produces a
sequence that shows the complexity of the outcomes
measured, ranging from the simple to the complex. 75
Conti…
For this purpose, items that measure similar outcomes should
be placed together and then arranged in order of ascending
difficulty.
For example the items under the multiple choice section might
be arranged in the following order: knowledge of terms,
knowledge of specific facts, knowledge of principles, and
application of principles.
Keeping together items that measure similar learning outcomes
is especially helpful in determining the type of learning
outcomes causing students the greatest difficulty.
76
2.7 Administrating and scoring Tests and reporting results
Test Administration refers to the procedure of actually presenting
the learning task that the examinees are required to perform in
order to ascertain the degree of learning that has taken place
during the teaching-learning process.
This procedure is as important as the process of preparing the test.
This is because the validity and reliability of test scores can be
greatly reduced when test is poorly administered.
While administering test all examinees must be given fair chance
to demonstrate their achievement of the learning outcomes being
measured.
77
Conti…
This requires the provision of a physical and psychological
environment which is conducive to their making their best
efforts and the control of such factors such as malpractices
and unnecessary threat from test administrators that may
interfere with valid measurement.
78
2.7.1 Ensuring Quality in Test Administration
Quality and good control are necessary components of test
administration.
The following are guidelines and steps involved in test
administration aimed at ensuring quality in test administration.
Collection of the question papers in time to be able to start the
test at the appropriate time.
Ensure compliance (respect) with the stipulated sitting
arrangements in the test to prevent collision (conflict) between or
among the test takers.
Ensure orderly and proper distribution of questions papers to the
test takers.
79
Conti…
Do not talk unnecessarily before the test.
Test takers’ time should not be wasted at the beginning of the test
with unnecessary remarks, instructions or threat that may develop
test anxiety.
It is necessary to remind the test takers of the need to avoid
malpractices before they start and make it clear that cheating will
be penalized.
Stick to the instructions regarding the conduct of the test and
avoid giving hints to test takers who ask about particular items.
But make corrections or clarifications to the test takers whenever
necessary.
80
2.7.2 Credibility and Civility in Test Administration
Credibility deals with the value the eventual recipients and users
of the results of assessment place on the result with respect to the
grades obtained, certificates issued or the issuing institution.
While civility on the other hand enquires whether the persons
being assessed are in such conditions as to give their best without
hindrances and burdens in the attributes being assessed and
whether the exercise is seen as integral to or as external to the
learning process.
Or Civility means treating all test-takers with respect, fairness,
and dignity, which helps reduce anxiety and creates a supportive
testing environment.
81
Conti…
Hence, in test administration, effort should be made to see that
the test takers are given a fair and unaided chance to demonstrate
what they have learnt with respect to:
A. Instructions: Test should contain a set of instructions which
are usually of two types.
One is the instruction to the test administrator while the other
one is to the test taker.
The instruction should explain how the test should be performed.
82
Conti…
B. Duration of the Test: Ample time should be provided for
candidates to demonstrate what they know and what they can do.
The duration of test should reflect the age and attention span of
the test takers and the purpose of the test.
c. Venue and Sitting Arrangement:
It is important to provide enough and comfortable seats with
adequate sitting arrangement for the test takers’ comfort and to
reduce collaboration between them.
Adequate lighting, good ventilation and moderate temperature
reduce test anxiety and loss of concentration which invariably
affects performance in the test.
83
2.7.3 Scoring tests
There are two common methods of scoring essay questions.
i. The point or analytic method: In this method each answer is
compared with already prepared ideal marking scheme (scoring
key) and marks are assigned according to the adequacy of the
answer.
When used carefully, the analytic method provides a means for
maintaining uniformity in scoring between scorers and between
scripts thus improving the reliability of the scoring.
84
Conti…
ii. The global/holistic of rating method: In this method the
examiner first sorts the response into three or more categories of
varying quality based on his general or global impression on
reading the responses.
The standard of quality helps to establish a relative scale, which
forms the basis for ranking responses from those with the poorest
quality response to those that have the highest quality response.
When the scorer is completely satisfied that each response is in
its proper category, it is marked accordingly.
85
Conti…
The following guidelines would be helpful in making the scoring
of essay items easier and more reliable.
You should ensure that you are firm emotionally, mentally etc
before scoring
All responses to one item should be scored before moving to the
next item
Write out in advance a model answer to guide yourself in grading
the students’ answers
Shuffle exam papers after scoring every question before moving
to the next
The names of test takers should not be known while scoring to
avoid bias
86
2.7.4 Reporting Assessment Results
School grades and progress reports serve various functions in the
school.
They provide information that is helpful to students, parents and
school personnel.
Obviously in our country’s education system, we use numeric
grades to report students’ performance at secondary school level.
For example, we may give marks to summarize students’ overall
performance.
At the same time we may hold conferences with parents to report
a qualitative description of students’ progress in their learning.
87
UNIT THREE:
Describing and Interpreting Test Scores
3.1. Introduction
In this unit we will see the idea of test score interpretation and the
major statistical techniques that can be used to interpret test
scores.
Particularly, the methods of interpreting test scores, includes
measures of central tendency, measures of dispersion or
variability, measures of relative position, and measures of
relationship or association.
88
3.2 Describing and interpreting test results
Test interpretation is a process of assigning meaning and
usefulness to the scores obtained from classroom test.
Kinds of scores
The most common kinds of scores include nominal, ordinal,
interval, and ratio scales.
89
Conti…
1. Nominal scale-when interpreting test scores using a nominal
scale, data is categorized in to distinct categories with out any
specific order or ranking.
For example, we may assign the number 1 for males and 2 for
females, pass or fail, high, medium, or low etc…
These categories have no inherent value and are solely used for
grouping and classification purpose.
90
Conti…
2. Ordinal scale with an ordinal scale, test scores are ranked or
ordered based on their relative standing.
For example, ranking students based on their performance on
a certain athletic event or assigned a grade level such as A, B
C, or D.
We know who is best, second best, third best, etc.
But the ranked do not tell us anything about the difference
between the scores or the actual numerical value or distance
between scores.
91
Conti…
3.Interval data when interpreting test scores using an
interval scale, the numerical values (score) assigned to
scores represents equal interval or distance between
scores (values).This allows for meaningful comparison
and calculation such as determining the difference
between two scores.
If, on a test with interval data, Almaz has a score of 60,
Abebe a score of 50, and Beshadu a score of 30, we could
say that the distance between Abebe’s and Beshadu’s
scores (50 to 30) is twice the distance between Almaz”s
and Abebe’s scores (60 t0 50).
92
Conti…
4. Ratio scale, the ratio of the scores has meaning.
Ratio scales allow individuals to make meaningful
comparison between scores by considering the magnitude of
the difference.
They provide a true zero point. Ratio scales have a true zero
point meaning that a score of zero indicates the absence of
the characteristics being measured.
There is a meaningful zero point. However, if a student
scored 0 on a spelling test, we would not interpret the score
to mean that the student had no spelling ability.
93
3.2.2 Measures of Central Tendency
The goal of the measures of central tendency is to provide
valuable information about the distribution of test scores.
There are three basic measures of central tendency – mean,
mode and median.
94
The Mean
The mean, or arithmetic average, is the most widely used
measure of central tendency.
It is the average of a set of scores computed simply by adding
together all scores and dividing by the number of scores.
Mean gives the general performance of the test-takers.
Here is an example of test scores for a Math’s class: 82, 93, 86,
97, 82.
To find the Mean, first you must add up all of the numbers.
(82+93+86+97+82= 433) Now, since there are 5 test scores,
divide the sum by 5. (440÷5= 88).
Thus, the Mean is 88.
95
Conti…
The formula used to compute the mean is as follows:
Where, X = Mean
∑ = the sum of
X = any score
N = Number of scores
96
The Median
The median is the middle value in a data set when the scores are
arranged in ascending order.
It is the number that divides a distribution of scores exactly in
half.
When the number of scores is odd, the median is the middle
score.
If the number of scores is even, the median will be halfway
between the two middle most scores.
97
Example 1 Example 2 Example 3 Example 4
Scores Scores Scores Scores
50 50 49 50
48 49 48 49
48 48 48 47
47 46 47 47
45 46 45 45
44 43 44 45
43 43 43 45
42 42 42 44
42 41 42 42
41 41 41 41
38 41 98
Conti….
In example 1, our line would be between 44 and 45, so the median
would be halfway between them at 44.5.
In this case the median is not an actual score earned by one of the
students.
In example 2, the distance between the two middle scores (43 and
46) is more than one, so we again find the point halfway between
them for our median of 44.5.
If the number of students is uneven, the median is the one score that
is the middle score in the frequency distribution, having equal
numbers of scores above and below it.
Thus, the median is 44 in example 3, and 45 in example 4. It does
not matter if more than one student earns that score, as in example 4.
99
The Mode
The mode is the score (value) that occur most frequently in a data
set.
It provides information about common scores or patterns in the
data.
The mode can be useful for identifying popular response choices or
recurring themes in test scores.
For example, the following test scores, 7, 7, 7, 20, 23, 23, 24, 25,
26 have a mode of 7.
In general, by considering mean, median and mode together, test
score interpreters can gain a comprehensive understanding of the
distribution of scores and make informed decision about the
performance of test-takers. 100
3.2.3 Measures of Variability/Dispersion
The measures of central tendency focus on what is typical,
average or in the middle of a distribution.
Knowing the mean, the median or the mode (or all of these)
of a distribution does not allow us to differentiate between
distributions.
But measure of variability represent quite different spreads of
performance and also identify such differences by numbers
that indicate how much scores spread out in a group.
The three most commonly used measures of variability are the
range, the inter quartile range, and the standard deviation.
101
The Range
It is the simplest measure of variability calculated by subtracting
the lowest score from the highest score.
It provides a quick and easy way to understand the spread of
scores. E.g. if a test has a range of 40 points, this indicates that
there is a wide variability in scoring among students.
For example, if the score of 10 students in a certain test is: 5, 7,
8, 10, 12, 13, 14, 15, 17, 19, then the range will be 19 -5 = 14.
102
Inter quartile range
Inter quartile range (IQR) is another range measure and the data
put in terms of quarters or percentiles and used to measure of the
spread of scores.
It is calculated as the difference between the upper quarter
(Q3)and lower quarter (Q1)of the data set.
E.g. if the interquartile range of a test is 15 points, this means that
the middle 50% of scores fall with in a 15-point range.
IQR is the distance between the 25 th and 75th percentile or the first
and third quarter.
The range of data is divided into four equal percentiles or quarters
(25%).
IQR is the range of the middle 50% of the data. 103
The Standard Deviation
It is essentially an average of the degree to which a set of
scores deviates from the mean.
If the Standard Deviation is large, it means the numbers are
spread out from their mean.
If the Standard Deviation is small, it means the numbers are
close to their mean.
Because it takes into account the amount that each score
deviates from the mean, it is a more stable measure of
variability than either the range or quartile deviation.
104
Conti…
The procedure for calculating a standard deviation involves the
following steps:
Compute the mean.
Subtract the mean from each individual’s score.
Square each of these individual scores.
Find the sum of the squared scores (∑X 2).
Divide the sum obtained in step 4 by N, the number of students,
to get the variance.
Find the square root of the result of step 5. This number is the
standard deviation (SD) of the scores.
105
Conti…
Thus the formula for the standard deviation (SD) is: SD=
The individual scores of group A is: 72, 76, 80, 80, 81, 83, 84,
85, 85, & 89. The individual scores of group B is: 57, 63, 65,
71, 83, 93, 94, 95, 96, 98. Let us start with group A.
So, the first step to finding the Standard Deviation is to find all
the distances from the mean.
This will be followed by squaring each distances which will
give us the following results.
106
Conti…
Score of team A Distance from the mean Distances squared
72
- 9.5 90.25
76
- 5.5 30.25
80
- 1.5 2.25
80
- 1.5 2.25
81
- 0.5 0.25
83
1.5 2.25
84
2.5 6.25
85
3.5 12.25
85
3.5 12.25
89
7.5 56.25
107
Conti…
Then we add up all of the squared distances which will gives us
214.5
This will be divided by the total number of scores of the group
which will result 214.5 /10 = 21.45. This is the variance of the
data set.
Variance is the average squared deviation from the mean of a set
of data.
It is used to find the standard deviation. Finally, we calculate
the Square Root of the variance.
This will give us 4.63, which is the standard deviation.
108
Conti…
GROUP A GROUP B
Average on the
quiz 81.5 81.5
Standard deviation
4.63 15.1
109
3.2.4. Measures of Relative Position
There are different ways to measure the relative position of
scores.
By comparing an individuals test score to the scores of their
peers or a larger reference group, it allows how well or
poorly they performed on the test in relation to others.
Suppose that you have scored 55 on a test. What do you say
about this score?
On the surface it might look bad but what if that was the
highest in the class or if that score was better than 80% of the
class? This is what we mean by relative position.
110
Percentiles
A percentile is a score that indicates the rank of the student
compared to others (same age or same grade), using a
hypothetical group of 100 students.
It tells you what percentage of people you did better than.
A percentile of 25 (25th percentile), for example, indicates
that the student's test performance equals or exceeds 25 out
of 100 students on the same measure.
A percentile of 87 indicates that the student equals or
surpasses 87 out of 100 (or 87% of) students.
A percentile must always refer to a student’s percentile rank
as relative to a particular norm group. 111
Percentile= number of scores below the given score/total
number of scores *100
e.g. Suppose there are 50 students who took a math test,
and you scored 75 out of 100 on a test. Find out your
percentile score? Lets say that 30 students scored below
you on the test.
Percentile= (30/50)*100
Percentile= (0.6)*100
Percentile=60
112
Quartiles
Quartile is another term referred to in percentile measure.
The total of 100% is broken into four equal parts: 25%, 50%,
75% 100%.
Lower Quartile is the 25th percentile. (0.25)
Median Quartile is the 50th percentile. (0.50)
Upper Quartile is the 75th percentile. (0.75)
113
Standard Scores
Another method of indicating a students relative position in
a group is by showing how far the raw score is above or
below average.
Basically, standard scores express test performance in terms
of standard deviation units from the mean.
Standard scores are scores that are based on mean and
standard deviation.
114
Types of standard scores
Z-scores are measures of how many standard deviations a
particular score is above or below the mean of a distribution.
e.g. a Z score of -1.5 would indicate that a score is 1.5 standard
deviations below the mean.
We define z score as z = X – X,
S
Where, X = the data value in question
X = the sample mean
s = the sample standard deviation
115
Conti…
For instance, if a person scored a 70 on a test with a mean of 50
and a standard deviation of 10, then they scored 2 standard
deviations above the mean.
So, a z score of 2 means the original score was 2 standard
deviations above the mean.
If the z-score is 0 then your data value is the mean
If the z-score > 0 (positive) then your data value is above the
mean
If the z-score < 0 (negative) then your data value is the below the
mean.
116
Conti…
Example. Almaz scored a 25 on her math test.
Suppose the mean for this exam is 21, with a standard deviation of
4.
Dawit scored 60 on an English test which had a mean of 50 with a
standard deviation of 5. Who did relatively better?
we will find the respective z-scores for Almaz and Dawit.
Almaz= z-score: 25 - 21 =1
4
Dawit's z-score: 60-50 = 2
5
Since Dawit had a higher z-score, we say Dawit did relatively
better. 117
Conti…
T Scores: This refers to any set of normally distributed
standard scores that has a mean score of 50 and a standard
deviation of 10.
It is useful making the Z-score conventional and clear.
They are commonly used to compare an individuals
performance to a larger group.
The T – score is obtained by multiplying the Z-score by 10
and adding the product to 50.
That is, T – Score = 50 + 10(z).
A score of 60 is one standard deviation above the mean, while
a score of 30 is two standard deviations below the mean. 118
Conti…
Example
A test has a mean score of 40 and a standard deviation of 4. What are
the T – scores of two test takers who obtained raw scores of 30 and
45 respectively in the test?
Solution
The first step in finding the T-scores is to obtain the z-scores for the
test takers.
The z-scores would then be converted to the T – scores.
In the example above, the z – scores are:
For the test taker with raw score of 30, the Z – score is:
Z – Score = X – M, where the symbols retain their usual meanings.
SD 119
Conti…
X = 30, M = 40, SD = 4.
Thus, Z – Score = 30 - 40 = -10 = -2∙5
4 4
The T - Score is then obtained by converting the Z – Score (-2∙5)
to T – score. Thus:
T – Score = 50 + 10 (z)
= 50 + 10 (-2∙5)
= 50 – 25
= 25
120
3.2.5 Measures of Relationship
If we have two sets of scores from the same group of people, it is
often desirable to know the degree to which the scores are related.
For example, the relationship between the test scores of students
for the English Subject and their overall scores of other subjects.
The degree of relationship is expressed in terms of coefficient of
correlation.
The value ranges from -1.00 to +1.00.
A perfect positive correlation is indicated by a coefficient of
+1.00 and a perfect negative correlation by a coefficient of -1.00.
121
Conti…
A correlation of .00 indicates no relationship between the two
sets of scores.
Obviously, the larger the coefficient (positive of negative), the
higher the degree of relationship expressed.
There are several different measures of relationship expressed
as correlation coefficients.
One of these is the product-moment correlation coefficient,
which is by far the most commonly used and most useful
correlation coefficient.
It is indicated by the symbol r.
122
Conti…
The formula for obtaining the coefficient of correlation is:
Where, X = score of person on one variable
Y = score of same person on the other variable
= mean of the X distribution
= mean of the Y distribution
Sx = standard deviation of the X scores
Sy = standard deviation of the Y scores
N = number of pairs of scores
123
3.3 Characteristics of a good test
Validity It refers to when a test serves its purpose(s), that is
measures what it intended to measure and to the extent
desired.
It is all about the extent to which assessment information can
be trusted (truthfulness).
124
Conti…
The following factors can influence the validity of a test:
• Unclear direction
• inappropriate level of difficulty
• Poorly constructed items ( clues to items)
• Test item inappropriate for the objectives being measured
• Improper arrangement of item
• Cheating in exams, emotional disturbance of examines
125
Conti…
Reliability: Test reliability refers to the accuracy, consistency
and stability of scores students would receive on alternate
forms of the same test.
The more consistent our test results are from one measurement
to another, the less error there will be and consequently, the
greater the reliability.
126
There are some factors the reliability of a test which include
the following:
• Test length: the longer a test is, the more reliable it is ( in that
wide coverage of content is ensured) but NOT TOO LONG
• Sample heterogeneity: the more heterogeneous the test
items, the higher the reliability
• Irregularities: lightening conditions, testee’s failure to follow
directions,
127
Conti…
• Objectivity- The fairness of a test to the testee, bias test does
not portray objectivity and hence is not reliable.
• A test that is objective has high validity and reliability
• Discrimination- A good test must be able to make distinction
between poor and good learner; it should show the slight
differences between learner attainment and achievement that
will make it possible to distinguish between poor and good
learner.
128
Conti…
• Comprehensiveness- Test items that covers much of the content
of the course, that is the subject.
• Ease of administration- a good test should not pose difficulties in
administration.
• Practicality and scoring- Assigning quantitative value to a test
result should not be difficult. Why, what and how.
• Usability- a good test should be useable, unambiguous and clearly
stated with one meaning only.
129
Unit Four: Item Analysis
4.2. Analyzing test items
It is the process involved in examining or analyzing testee responses
to each item on a test with a basic intent of judging the quality of item.
Item analysis is the process of studying the characteristics of a test
items based on data obtained from examinees.
Item analysis indicates which item is difficult or easy, which item
effectively discriminates between high and low achievers and whether
the item functions as it was intended or not.
If an item is too easy, too difficult, failing to show a difference
between skilled and unskilled examinees.
The two most common statistics reported in an item analysis are the
item difficulty and the item discrimination.
130
4.2.1. Item difficulty level index
It is a measure of the proportion of examinees who answered the item correctly.
Item difficulty refers to how easy or hard a test question is for a group of
examinees. It is usually measured by the proportion of students who answered
the item correctly.
P= the symbol of item difficulty index.
The p-value ranges from 0 to 1:
• Closer to 1 → easy item
• Closer to 0 → difficult item
• Ideal item difficulty for most classroom tests is between 0.3 and 0.7.
When there is a sufficient number of scores available (i.e., 100 or more)
difficulty indexes are calculated using scores from the top and bottom 27
percent of the group.
When we calculate item difficulty level, first rank the papers in
order from the highest to the lowest score.
131
Conti…
An item difficulty level is determined by:
P= CRU+CRL*100
NU+NL
where, P = difficulty index
CRu = number of correct responses from upper group
which is 27% of the total respondents.
CRL = number of correct responses from lower group
(27% of the total respondents).
Nu = number of respondents in the upper group
NL = number of respondents in the lower group
132
P= success in HSG+ success in LSG
---------------------------------------
HSG +LSG
P= the symbol of item difficulty index
Item difficulty index ranges between 0.0 and 1.0 and is expressed as a
percentage
Conti….
The difficulty indexes can range between 0.0 and 1.0 and
are usually expressed as a percentage.
A higher value indicates that a greater proportion of
examinees responded to the item correctly, and it was thus
an easier item.
For maximum discrimination among students, an average
difficulty of .60 is ideal.
An item difficulty of 1.00 indicates that everyone
answered correctly, while 0.00 means no one answered
correctly.
134
Item difficulty interpretation
P-Value Percent Range Interpretation
> or = 0.75 75-100 Easy
< or = 0.25 0-25 Difficult
between .25 & .75 26-74 Average
135
Example: Determination of Item difficulty
Options Upper 27% Lower 27% Total
A 1 4 5
B** 23 17 40
C 1 3 4
D 2 3 5
Omit 0 0 0
Total 27 27 58
From the above data,
CRu = 23 Nu = 27
CRL = 17 NL = 27
23 17 40
100 100 0.74 100 74%
Then, P = 27 27 54
Comment: The item is somewhat easy and appropriate for
classroom achievement test. 136
Activity: Calculate the item difficulty level for the following four
options multiple choice test item. (The sign (*) shows the correct
answer).
Response option TOTAL
Groups A B C D*
10
High Scorers 0 1 1 8
10
Low Scorers 1 1 5 3
20
Total 1 2 6 11
137
4.2.2. Item discrimination index
The index of discrimination is a numerical indicator that enables
us to determine whether the question discriminates appropriately
between lower scoring and higher scoring students.
Item discrimination refers to how well a test item differentiates
between high-performing and low-performing students.
D-value ranges from -1 to +1:
An item discriminates in a positive direction if more students in
the upper group than the lower group get the item right.
Positive discrimination index indicates that the item functions as
it is intended.
138
Conti…
D= CRU-CRL or D= CRU-CRL
NU NL
From the preceding data
CRU=23 CRL=17 NU=27 NL=27
Therefore D= 23-17 D= 6 = 0.22
27 27
Comment: - It is a marginal that needs some
improvement
139
Conti…
Item discrimination index can be calculated using the following
formula:
D=Success in the HSG-Success in the LSG
½ (HSG+LSG)
Where, HSG = High Scoring Groups
LSG = Low Scoring Groups
140
An item will have a maximum positive discriminating power if all
students from the upper group get the item right and all from the lower
group miss it.
27 0
That is, D 1
27
An item will have no discriminating power if all students both from the
upper and lower groups get the27item
27
right or miss it. 0 0
That is, D or 0 D 0
27 27
An item will have negative discriminating power if more students from
the lower group than the upper group get the item right. Such items
should be revised so that they discriminate positively or discarded.
Moreover, items answered correctly or incorrectly by all examinees can’t
discriminate at all and should be revised so that they discriminate or
141
discarded.
Conti…
The item discrimination index can vary from -1.00 to +1.00.
A negative discrimination index (between -1.00 and zero)
results when more students in the low group answered correctly
than students in the high group.
A discrimination index of zero means equal numbers of high and
low students answered correctly, so the item did not discriminate
between groups.
A positive index occurs when more students in the high group
answer correctly than the low group.
If the students in the class are fairly homogeneous in ability and
achievement, their test performance is also likely to be similar,
resulting in little discrimination between high and low groups.
142
Generally, indices of item discrimination can be evaluated on
the following terms as suggested by Ebel (1972)
Table: - Discrimination Indices for Item Evaluation
Index of
Discrimination
0.40 and up Very good
Good
0.30 – 0.39
Marginal that needs improvement
0.20 – 0.29
Poor items (Should be discarded)
Below 0.19
Therefore, very good classroom test items should have indices of
discrimination of 0.40 or above. Poor items to be rejected or improved by
revision are those which have indices of discrimination below 0.19. 143
Item discrimination interpretation
D-Value Direction Strength
> +.40 positive strong
+.20 to +.40 positive moderate
.1to +.20 marginal Needs improvement
<.1 Poor items Should be discarded
144
4.3. Evaluating the Effectiveness of Distracters
Distracters should be as attractive as that of the correct answer.
In a properly constructed multiple-choice items, each distracter
will be selected by some students. Specifically, it should attract
more students from the lower group than from the upper group.
If a distracter is not selected by any one, it makes no contribution
to the functioning of the item and should be eliminated or revised.
A distractor analysis evaluates the effectiveness of the
distracters in each item by comparing the number of students
in the upper and lower groups who selected each incorrect
alternative (a good distracter will attract more students from
the lower group than the upper group).
145
Example: Evaluating the Effectiveness of Distracters
Options Upper 27% Lower 27% Total
A
A** 12 12 7 7 19 19
B
B 12 10 22
C 0 0 0
D 3 10 13
Omit 0 0 0
CRU CRL
Difficulty Index (P) = NU N L
100
12 7 19
P 100 100 0.352 100 35.2%
27 27 54
CRU CRL
Discriminating Power (D) = NU
12 7 5
D 0.185 0.19
27 27 146
Comment:
1. The item discriminates in a positive direction since 12 in the
upper group and 7 in the lower group got the item right, index
of discriminating power (D) is low.
2. The item is difficult for classroom achievement test.
3. Alternative “B” is a poor distracter because it attracts more
students from the upper group than from the lower group.
4. Alternative “C” is inefficient since it attracted no one.
5. Alternative “D” is functioning as intended for it attracted more
students from the lower group than from the upper group
6. The discriminating power of the item may be improved by
removing ambiguity in the statement of the item and replacing
alternative “B” and “C” 147
Unit Five: Ethical Standards of Assessment
5.2. Ethical and Professional Standards of Assessment and its
Use
Ethical standards guide teachers in fulfilling their
professional obligation to provide and use tests that are fair
to all test takers regardless of age, gender, disability,
ethnicity, religion, linguistic background, or other personal
characteristics.
Teachers should be fair in different aspects of assessment.
148
Conti…
The following are some ethical standards that teachers may
consider in their assessment practices.
Teachers should be skilled in choosing assessment methods
that enable them to make appropriate for instructional
decisions.
Teachers need to be well-acquainted with the kinds of
information provided by a broad range of assessment
alternatives and their strengths and weaknesses.
In particular, they should be familiar with criteria for
evaluating and selecting assessment methods in light of
instructional plans. 149
Conti…
Teachers should develop tests that meet the intended
purpose and that are appropriate for the intended test
takers.
The teacher should be skilled in administering, scoring and
interpreting the results from diverse assessment methods.
It is not enough that teachers are able to select and develop
good assessment methods; they must also be able to apply
them properly.
Teachers should be skilled in communicating assessment
results to students, parents, and other educators.
150
5.3. Ethnicity and Culture in tests and assessments
Students represent a variety of cultural and linguistic
backgrounds. If the cultural and linguistic backgrounds are
ignored, students may become alienated or disengaged from the
learning and assessment process.
Teachers need to be aware of how such backgrounds may
influence student performance and the potential impact on
learning.
Classroom assessment practices should be sensitive to the
cultural and linguistic diversity of students in order to obtain
accurate information about their learning.
Assessment practices that attend to issues of cultural diversity
include those that 151
Conti…
Acknowledge students’ cultural backgrounds.
are sensitive to those aspects of an assessment that may
hamper students’ ability to demonstrate their knowledge and
understanding.
use that knowledge to adjust or scaffold assessment practices
if necessary.
Assessment practices that attend to issues of linguistic
diversity include those that
Acknowledge students’ differing linguistic abilities.
152
Conti…
use that knowledge to adjust or scaffold assessment practices
if necessary.
use assessment practices in which the language demands do
not unfairly prevent the students from understanding what is
expected of them.
Teachers must make every effort to address and minimize the
effect of bias in classroom assessment practices.
153
Conti…
Assessment should be culturally and linguistically
appropriate, fair and bias-free.
For an assessment task to be fair, its content, context, and
performance expectations should:
reflect knowledge, values, and experiences that are equally
familiar and appropriate to all students;
tap knowledge and skills that all students have had adequate
time to acquire;
be as free as possible of cultural and ethnic stereotypes.
154
5.4. Disability and Assessment Practices
Inclusive education is based on the idea that all students,
including those with disabilities, should be provided with the
best possible education to develop themselves.
This implies for the provision of all possible accommodations
to address the educational needs of disabled students.
Accommodations should not only refer to the teaching and
learning process.
It should also consider the assessment mechanisms and
procedures.
155
Conti…
There are different strategies that can be considered to make
assessment practices accessible to students with disabilities
depending on the type of disability.
In general terms, however, the following strategies could be
considered in summative assessments:
Modifying assessments: - This should enable disabled
students to have full access to the assessment without giving
them any unfair advantage.
156
Conti…
Others’ support: - Disabled students may need the support of
others in certain assessment activities which they can not do it
independently.
For instance, they may require readers and scribes in written
exams; they may also need others’ assistance in practical activities,
such as using equipments, locating materials, drawing and
measuring.
Time allowances: - Disabled students should be given additional
time to complete their assessments which the individual instructor
has to decide based on the purpose and nature of the assessment.
Rest breaks: Some students may need rest breaks during the
examination. This may be to relieve pain or to attend to personal
needs. 157
Conti…
Flexible schedules: In some cases disabled students may require
flexibility in the scheduling of examinations. For example, some
students may find it difficult to manage a number of examinations
in quick succession and need to have examinations scheduled over a
period of days.
Alternative methods of assessment:- In certain situations where
formal methods of assessment may not be appropriate for disabled
students, the instructor should assess them using non formal
methods such as class works, portfolios, oral presentations, etc.
Assistive Technology: Specific equipment may need to be available
to the student in an examination. Such arrangements often include
the use of personal computers, voice activated software and screen
readers. 158
5.5 Gender issues in assessment
Teachers’ assessment practices can also be affected by gender
stereotypes.
The issues of gender bias and fairness in assessment are
concerned with differences in opportunities for boys and girls.
A test is biased if boys and girls with the same ability levels
tend to obtain different scores.
Test questions should be checked for:
Material or references that may be offensive to members of
one gender,
References to objects and ideas that are likely to be more
familiar to men or to women
159
Conti…
unequal representation of men and women as actors in test
items or representation of members of each gender only in
stereotyped roles.
If the questions involve objects and ideas that are more
familiar or less offensive to members of one gender, then the
test may be easier for individuals of that gender.
Standards for achievement on such a test may be unfair to
individuals of the gender that is less familiar with or more
offended by the objects and ideas discussed, because it may
be more difficult for such individuals to demonstrate their
abilities or their knowledge of the material.
160
161