0% found this document useful (0 votes)

36 views161 pages

Sol Assessment

This document provides a comprehensive overview of assessment and evaluation in education, detailing concepts such as testing, measurement, assessment, and evaluation, along with their purposes and importance. It outlines various assessment types, including formative and summative assessments, and discusses the role of educational objectives in guiding assessment practices. Additionally, it emphasizes the competencies required for teachers to effectively assess student learning and the different assessment strategies and methods available.

Uploaded by

mayiuse22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views161 pages

Sol Assessment

Uploaded by

mayiuse22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Assessment and evaluation of learning

1
CHAPTER ONE
Assessment: Concept, Purpose, and Principles
1.1 INTRODUCTION
This module is designed to equip you with the basic knowledge
and practical skills required to assess students’ learning.
1.2. Concepts
You might have come across the concepts test, measurement,
assessment, & evaluation.
A. Test in educational context is meant to the presentation of a
standard set of questions to be answered by students.
Involves a series of questions with different item types. It
is given formally while a course is on progress.
2
• It is one instrument or tool that is used to determine students’
ability, skills or performance on specific content.
• Its purpose is to assess learning progress and identify if there is
learning difficulties.
Quiz
• A short and informal test.
• It is given at class hour just at the beginning or at the end.

3
Conti…

Examination
• A more comprehensive form of a test often used at the end of
a term, semester or year.
• Covers a large area of content.
• It is given at the end of a course or semester.
• Its main purpose is to assign grade.
• The number of items included is large.

4
B. Measurement
A systematic description of students performance in terms of
numbers.
The process of assigning number or quantifying to represent
an individual's performance is called measurement.
Measurement is the process by which the attributes of a
person are measured and described in numbers.
It is a quantitative description of the behavior or
performance of students.
E.g. Haritu solved correctly 15 of the 20 quadratic
equations.
5
C. Assessment
Assessment is a much more comprehensive and inclusive concept
than testing and measurement. It includes the full range of
procedures (observations, rating of performances, paper and
pencil tests, etc) used to gain information about students’ learning.
Educational assessment is the process of collecting information
with the purpose of making decisions about students learning
progress.
We may collect information using various instruments including
teacher made tests, observations of students including their
written works and answers to questions in class, checklists,
questionnaires and interviews.
6
D. Evaluation
Evaluation: refers to the process of judging the quality of student
learning on the basis of established performance standards and
assigning a value to represent the worthiness or quality of that
learning or performance.
It is concerned with determining how well students have learned.
Evaluation is based on assessment that provides evidence of
student achievement at strategic times throughout the grade/course,
often at the end of a period of learning.
E.g. 1. Tonja's work was neat.
2. Madebo is making a good progress in
mathematics.
7
Evaluation refers to a systematic process of determining the
extent to which instructional objectives are achieved by
pupils.
May include either quantitative or qualitative descriptions of
pupil’s performance or both.
Evaluation always involves value judgments concerning the
desirability of the results.
Evaluation = Quantitative descriptions (measurement) and/or
Qualitative descriptions (non-measurement) Plus Value
judgments

8
1.3. Importance and Purposes of Assessment

Assessment for improved student learning requires a range of

assessment practices to be used with the following purposes:
Assessment of Learning: this kind of assessment is usually
summative in nature, which is done at the end of a learning
task.
It is designed to provide evidence for teachers to make
decisions/judgments about students’ achievement against set
goals and standards.
It also helps to provide evidence of students’ achievement to
parents, administrators, educators and students themselves.
9
Assessment for Learning: this type of assessment occurs while
teaching and learning is on progress, rather than at the end.
In assessment for learning, teachers use assessment evidences to
monitor students learning progress and inform their teaching.
This form of assessment is designed to provide diagnostic
information to teachers about students’ prior knowledge and
formative information about the effects of their instruction on
student learning.
It also provides students with important information about their
learning and the effectiveness of the learning strategies they are
using. It is the most important form of assessment in regard to
student learning. 10
Assessment as learning:
Assessment as learning involves students in their own
continuous self-assessment and is designed to help students
become more self-directed learners.
Self-assessment involves helping students set their own
learning goals, monitor progress toward achieving these
goals, and make adjustments in learning strategies as
required.
Enables students to identify and reflect on elements of their own
learning.

11
 With regard to teaching, assessment provides information about
the attainment of objectives, the effectiveness of teaching
methods and learning materials.
Overall, assessment serves the following main purposes.
1. Assessment is used to inform and guide teaching and
learning:
It provides teachers with information about what students
know and can do.
To plan effective instruction, teachers also need to know
what the student misunderstands.
12
Conti…

2. Assessment is used to help students set learning goals:

Students need frequent opportunities to reflect on where they
are in their learning, where they need to go and what needs to
be done to achieve their learning goals.
3.Assessment is used to assign report card grades:
Grade reports provide parents, employers, schools, and other
stakeholders including the government, post-secondary
institutions and employers with summary information about
student learning.

13
Conti…

4. Assessment is used to motivate students:

Research has shown that students will be confident and
motivated when they experience progress and achievement,
rather than the failure and defeat associated with being
compared to more successful peers.

14
1.4 The Role of Educational Objectives in Assessment

Educational objectives which are commonly known as learning

outcomes play a key role in both the instructional process and the
assessment process.
They are desirable changes in behavior or outcome statements that capture
specifically what knowledge, skills, attitudes learners should be able to
exhibit following the instructional process.
They serve as guides for both teaching and learning,
communicate the intent of instruction to others, and provide
guidelines for assessing students learning.

15
Conti…

 Educational objectives or learning outcomes are stated what

the students are expected to be able to do at the end of the
instruction.
 For instance, after teaching them on how to solve quadratic
equations, we might expect students to have the skill of
solving any quadratic equation.
 A learning outcome indicates the kind of performance
students are expected to exhibit as a result of the instruction.

16
Classification of Educational Objectives

Bloom and his associates have developed a taxonomy of

educational objectives, which provides a practical framework
within which educational objectives could be organized and
measured.
In this taxonomy Bloom et al (1956) divided educational
objectives into three domains. These are cognitive domain,
affective domain and psychomotor domain.

17
Conti…

Cognitive domain: This involves those objectives that deal with

the development of intellectual abilities and skills. These have to
do with the mental abilities of the brain.

18
Levels of cognitive domain

Level Description
Knowledge recognition or recall of previous learned information
Comprehension is all about internalization of knowledge
Application use of abstractions in a concrete situation
Analysis the breaking down of a learnt material into parts, ideas and
devices for clearer understanding

Synthesis Combining components to form a new whole

Evaluation making a quantitative or qualitative judgment about a piece
of communication, a procedure, a method, a proposal, a plan
etc 19
Affective Domain: Affective domain has to do with feelings and
emotions. It is concerned with interests, attitudes, appreciation,
emotional biases and values.
Level Description
receiving Freely attending to stimuli
responding Voluntarily reaching to stimuli
valuing Forming an attitude toward a stimulus
organization Bringing together different values and building a consistent value
system by resolving any possible conflicts between them

characterization Behaving consistently with an internally developed, stable value

system
20
Conti…

Psychomotor domain: The psychomotor domain has to do

with motor skills or abilities.
It deals with such activities which involve the use of the hand
or the whole of the body. Can you think of such abilities or
skills. Consider the skills in running, walking, swimming,
jumping, eating, playing, throwing, etc.

21
Levels of the psycho motor domain

Level Description
Imitation Observing and patterning behavior after someone else

Manipulation Being able to perform certain actions by following written/oral

instructions and practicing

precision Refining, becoming more exact. Few errors are apparent

Articulation Coordinating a series of actions, achieving harmony and internal

consistent
Naturalization Having high level performance become natural, without needing to think
much about it.
22
1.5 Assessment and Teacher Professional Competence in Ethiopia

A teacher should have some basic competencies on classroom assessment

so as to be able to effectively assess his/her students learning.
The seven standards articulating teacher competence in the educational
assessment of students are:
1. Teachers should be skilled in choosing assessment options appropriate
for instructional decisions. In particular, they should be familiar with
criteria for evaluating and selecting assessment methods in light of
instructional plans.
2. Teachers should be skilled in developing assessment methods appropriate
for instructional decisions. Assessment tools may be accurate and fair
(valid) or invalid. Teachers must be able to determine the quality of the
assessment tools they develop.
23
3. Teachers should be skilled in administering, scoring, and
interpreting the results of assessment methods. It is not enough
that teachers are able to select and develop good assessment
methods; they must also be able to apply them properly.
4. Teachers should be skilled in using assessment results when
making decisions about individual students, planning teaching,
developing curriculum, and school improvement.
5. Teachers should be skilled in developing valid student grading
procedures that use pupil assessments. Grading students is an
important part of professional practice for teachers.

24
6. Teachers should be skilled in communicating assessment
results to students, parents, and other educators.
7. Teachers must be well-versed in their own ethical and legal
responsibilities in assessment.

25
Unit Two: Assessment Strategies, Methods, and Tools

2.1 Introduction
2.2. Types and approaches to assessment
There are different approaches in conducting assessment in the
classroom.
Here five pairs of assessment typologies: namely, formal vs.
informal, criterion referenced vs. norm referenced, formative vs.
summative assessments, divergent vs. convergent, process vs.
product assessment.

26
2.2.1. Summative versus Formative assessment

A- Formative assessment:- is conducted to monitor the

instructional process, to determine whether learning is taking
place as planned.
It occurs during instruction.
The major function of formative assessment in the classroom
is to provide continuous feedback to both students and teacher
concerning learning success and failure or how things are
going in instructional process and to enhance students
learning.
Information is obtained through, teacher observation,
classroom oral questioning, homework assignments, classroom
assignments, and quizzes, diagnostic test and lab reports etc.. 27
B- Summative assessment:-Summative assessment typically
comes at the end of a course (or unit) of instruction.
It evaluates the quality of students’ learning and assigns a mark
to that students’ work based on how effectively learners have
addressed the performance standards and criteria.
Grading to determine if the program was successful.
To certify students and improve the curriculum
e.g. Final exams, national examinations, qualifying tests.

28
2.2.2 Formal vs. Informal Assessment

I. Formal Assessment: Formal assessments are where the

students are aware that the task they are doing is for
assessment purposes.
They are frequently used in summative assessments. This
usually implies a written document, such as a test, quiz, or
paper.
A formal assessment is given a numerical score or grade based
on student performance.

29
II. Informal Assessment: "Informal" is used to indicate
techniques that can easily be incorporated into classroom
routines and learning activities.
In the case of informal assessment the students may be un
aware that the task they are doing is for assessment purposes.
Informal assessment techniques can be used at anytime
without interfering with instructional time.
Their results are indicative of the student's performance on
the skill or subject of interest.
Thus they are more frequently used in formative assessments.

30
2.2.3.Criterion-referenced vs Norm-referenced assessment
A-Criterion-referenced assessment:- is concerned with a way of
interpreting a test score which compares an individual’s
performance to the established standard (criteria) of performance.
The criterion-referenced interpretations enable us to describe
what an individual can do without reference to the performance
of others.
Criterion referenced assessments help to eliminate competition
and may improve cooperation.

31
B- Norm-referenced assessment:- refers to a form of interpreting a
test score that employees the practice of comparing a student’s
performance to the class performance or some external average
performance such as local, state or national averages.
The focus of attention in this type of assessment is on how well
the student has done on a test in comparison with other students.
For example, students’ results in grade 8 national exams in our
country are determined based on their relative standing in
comparison to all other students who have taken the exam.

32
Conti…

Which one of the following is an example of CRA or NRA ?

1. Hundito computes simple linear equations.
2. Abebe computes simple linear equation better than
75% of the students in the class.
3. Shallamo can spell words better than half of his
classmates in the language class of elementary school.
4. Huluagerish can convert temperature from the Celsius
to the Fahrenheit scale.

33
2.2.4 Divergent versus Convergent Assessment

Divergent assessments are those for which a range of answers or

solutions might be considered correct.
For example, a Civics teacher might ask his/her students to compare
presidential and parliamentary forms of government as preferable forms
of government for a country.
A student might favor a presidential form of government by providing
sound arguments and valid examples.
Another student also might come up with still convincing ideas
favoring parliamentary form of government.
In both cases the answers are different but convincingly correct. So in
divergent assessments there might not be one single answer.
Divergent assessment tools include essay tests, and solutions to the
workout problems. 34
A convergent assessment are those which have only one
correct response that the students is trying to reach.
They are generally easier to mark.
They tend to be quicker to deliver and give more specific and
directed feedback to individuals.
It can also provide wide curriculum coverage.
Objective test items are the best example and demonstrate the
value of this approach in assessing knowledge.

35
2.2.5 Process versus Product Assessment

Process assessment focuses on the steps or procedures underlying a

particular ability or task, i.e., the cognitive steps in performing a mathematical
operation or the procedure involved in analyzing a blood sample.
Because it provides more detailed information, process assessment is most
useful when a student is learning a new skill and for providing formative
feedback to assist in improving performance.
For example, a Biology teacher teaching his students how to identify a
microorganism using a microscope might give them a task to do an activity.
Here his focus is not only on whether students are able to identify the
microorganism.
He should also check on whether students have followed the proper
procedures to reach the conclusion.

36
Product assessment focuses on evaluating the result or
outcome of a process.
Using the above examples, we would focus on the answer to
the math computation or the accuracy of the blood test
results.
Product assessment is most appropriate for documenting
proficiency or competency in a given skill.
A multiple choice test that a Mathematics teacher gives to his
students, for example, is a product assessment.
There is no way he/she will check whether students have
followed the proper procedures to get the correct answer.
37
2.5. Selecting and developing assessment methods and tools

2.5.1. Selecting appropriate assessment methods and tools

 When selecting and constructing assessment tools we
consider the following questions
 Does the assessment adequately evaluate academic
performance relevant to the desired outcome?
 Does the assessment enable students to different learning
styles?

38
2.5.2. Planning Tests

Tests are one of the most important and commonly used

assessment instruments used in education.
The development of valid, reliable and usable questions involves
proper planning.
Planning helps to ensure that the test covers the pre-specified
instructional objectives and the subject matter (content) under
consideration.
Hence, planning classroom test involves identifying the
instructional objectives earlier stated and the subject matter
(content) covered during the teaching/learning process.

39
Conti…

The following serves as guide in planning a classroom test:

• Determine the purpose of the test;
• Describe the instructional objectives and content to be measured.
• Determine the relative emphasis to be given to each learning
outcome;
• Select the most appropriate item formats (essay or objective);
• Develop the test blue print to guide the test construction;
• Decide on the pattern of scoring and the interpretation of result;
• Decide on the length and duration of the test, and
• Assemble the items into a test, prepare direction and administer
the test.
40
Table of Specification

A table of specification is a two-way table that matches the

objectives and content you have taught with the level at which
you expect your students to perform.
It is also known as Test blue print, framework or Engineering
design plan to test developers.
It is usually a two- way chart or grid
– Thinking levels along the horizontal axis
– Contents along the vertical axis
Can be designed
a. Across item types and
b. Blooms taxonomy
Purpose of the Table of specification

-Identify the learning outcomes of the subject taught

-Determine level of thinking required by competencies
-Proportionate time given and number of questions
-Align content, method and assessment
-Increase the quality of assessment items.
Test Specification

Content Specific Objectives

Knowledg Comprehensi Application Analysis Synthesis Evaluation Total
e on

Cha 1 6 1 1 1 - - 9
Cha 2 4 2 2 - 1 - 9
Cha 3 2 2 1 2 1 8
Cha 4 2 3 2 4 - 1 12
Cha 5 3 1 3 1 1 1 10
Total 17 9 9 8 3 2 48
Cont….

Content True /false Match Short ans. MC Essay

Cha 1

Cha 2

Cha 3

Cha 4

Cha 5

Total
• The rows show the content areas from which the test is to be
sampled; and the columns indicate the level of thinking
students are required to demonstrate in each of the content
areas.
• Similarly, content areas on which you have spent more
instructional time should be allotted more test items.

45
2.5.3 Constructing Classroom Tests

Constructing Objective Test Items

There are various types of objective test items.
These can be classified into those that require the student to
supply the answer (supply type items) and those that require the
student to select the answer from a given set of alternatives
(selection type items).
Supply type items include completion items and short answer
questions.
Selection type test items include True/False, multiple choice and
matching.
46
A- True/False Test Items

Advantages of True false /Alternative Response Items

The main advantage of true/false items is that they do not
require the student much time for answering.
This allows a teacher to cover a wide range of content by
using a large number of such items.
In addition, true/false test items can be scored quickly,
reliably, and objectively by anybody using an answer key.

47
Conti…

Dis advantages of True false /Alternative Response Items

The major disadvantage of true/false items is that when they are
used exclusively, they tend to promote memorization of factual
information: names, dates, definitions, and so on.
Some argue that another weakness of true/false items is that they
encourage students for guessing.
This is because any student who takes such type of tests does have
a 50 percent probability of getting the right answer.
Do not discriminate b/n students of varying ability as well as other
test items.
Cheating
They tend to be less discriminating. 48
Conti…

Suggestions to construct true/false items

Avoid negative statements, and never use double negatives.
Restrict single-item statements to single concepts.
E.g. An item consisting of two statements , one part correct the
other part wrong.
Use an approximately equal number of items, reflecting the two
categories tested.
Make statements representing both categories equal in length.

49
Conti…

Avoid specific determiners

• Most, all, always, sometimes, in most cases etc…
• Statements that include such absolutes as "always,"
"never,“ "all," "none," and "only" tend to be false;
• Statements with qualifiers such as "usually," "may,"
and "sometimes" tend to be true.

50
B. Matching Items

A matching item consists of two lists of words or phrases.

The test-taker must match components in one list (the premises,
presented on the left) with components in the other list (the
responses presented on the right), according to a particular kind
of association indicated in the item’s directions.
Advantages of matching items
The major advantage of matching items is its compact form,
which makes it possible to measure a large amount of related
factual material in a relatively short time.
Another advantage is its ease of construction.
51
Conti…

Limitation of matching items

They are restricted to the measurement of factual information
based on rote learning.
Difficulty of finding homogenous material that is significant
from the perspective of the learning outcomes.
Highly susceptible (expose) to the presence of irrelevant clue if
there is lack of plausible responses.

52
CONTI…

Suggestions for Constructing Matching Items

1. Employ homogeneous lists
e.g. All the items should deal about rivers
2. Include more responses than premises
3. Try to place all premises and responses for any matching item on
a single page. Disturbance created by 50 or so students shifting
the page of the test back and forth.

53
Cont.….

4. Arrange the list of responses in logical order, for example, in

alphabetical order and number in sequence.
Column “A” Column “B”
Invasion of Italy 1888
Battle of Matamma 1893
The battle of Meqdela 1912
The Battel of Adewa 1926
1933
1938
C. Short Answer/completion test Items
have two varieties:
The question variety: the item presented as a direct
question
What is the longest river in Ethiopia?
The completion variety: the item given in an incomplete
statement.
The longest river in Ethiopia is ___________.
Advantages of short answer/completion items
Relatively easy to construct
Reduces the possibility of guessing the correct answer.
Conti…

Limitation of short answer/completion items

Scoring is difficult because of multiplicity of plausible answers
Limited to measure largely specific facts because answers are
restricted to a few words, phrases or numbers.
unsuitable for assessing complex learning outcomes.
Suggestions for construction of short answer/completion
items
1. Word the item so that the required answer is both definite &
brief
• Should be written in such a way that it would have only one
correct answer.
• Example: An animal that eats the flesh of other animals is _____. Poorly stated
• An animal that eats the flesh of other animals is classified as _____. Better item
Cont.….

2. Do not take statements directly from textbooks

3. A direct question is more desirable than are incomplete
statement,
The longest river in Ethiop is ________
What is the longest river in Ethiopia?
4. Have the blanks occur near the end of the sentence
Poor: The ___________ is the smallest particle of matter
Better: the smallest particle of matter is ________
Cont.…

5. Omit key words & phrases rather than trivial details.

Poor: Columbus _______American in 1892(discovered)
Better: Columbus discovered America in
(year)______________.
6. Avoid too many blank statements because their meaning
will be lost & make pupil to guess.
Poor: __________ animals that are born _____ and________
their young’s are called _______ (mammals)
D. Multiple-Choice Items

This is the most popular type of selected-response item.

A student is first given either a question or a partially
complete statement. This part of the item is referred to as the
item’s stem. Then three or more potential answer-options are
presented. These are usually called alternatives, choices or
options.
The correct response is called the key answer, the remaining
alternatives are called distractors.

59
Anatomy of a multiple choose questions
• 2 parts:
 stem – present a problem situation
 alternatives, options, or choices – provide possible answers
• Stems may be in the form of a question or an incomplete
statement
• Example
There are two important variants in a multiple-choice item:
1-whether the stem consists of a direct question or an incomplete
statement, and
2-whether the student’s choice of alternatives is supposed to be a correct
answer or a best answer. The following two examples will demonstrate
their differences:
• A direct-question (best-answer) multiple-choice item
• Which of the following European countries as suffered more from the
consequences of the Second World War?
– Germany B. Britain C. France D. Russia/USSR/
• An incomplete-statement (correct-answer) multiple-choice item
• The Second World War was started in the year ________________
– 1936 B. 1939 C. 1941 D. 1945 61
Advantages of multiple choose questions
Measure varieties of learning outcomes ranging from simple
to complex.
Free from some common short comings (gaps) observed in
True- False, Matching or completion items.
The most adaptable (More flexible)—any type of subject
matter can be tested.
Can be scored easily
Small guessing
Conti…

Limitations of multiple choose questions

Difficult to construct
Takes more time to read
More space per item
Measure learning outcomes at lower level like other paper &
pencil tests.
Not adaptable to the measurement of problem solving skills,
organizing and presenting ideas.
Difficulty of getting a sufficient and plausible( Similar)
distractors.
Facilitates cheating.
Suggestions for construction of multiple choose items
Avoid negatively stated stems. Just as with the True/False items,
negatively stated stems can create confusion in students.
Each alternative must be grammatically consistent with the item’s stem.
Poor: an electric transormer can be used
A. for storing up electricity
B. to increase the voltage of alternating current
C. It converts electrical energy into mechanical energy
D. Alternative current is changed to direct current.
Better: An electric transformer can be used to
A. Store up electricity
B. Increase voltage of alterative current
C. Convert electrical energy into mech. Energy.
D. Change alternating current to direct current
64
Cont…

Make all alternatives plausible, but be sure that one of them is

indisputably the correct or best answer.
Randomly use all answer positions in approximately equal
numbers.
Never use “all of the above” and “none of the above.
Verbal associations between the stem and the correct answer
should be avoided.
The relative length of the alternatives should not provide a clue to
the answer.
All options should be homogeneous

65
Constructing Essay or Subjective test items
The distinctive feature of essay questions is that students are free
to construct, relate, and present ideas in their own words.
Learning outcomes concerned with the ability to conceptualize,
construct, organize, relate, and evaluate ideas require the
freedom of response and the originality provided by essay
questions.
Essay questions can be classified into two types – restricted-
response essay questions (structured) and extended response
essay questions.
A. Restricted-response essay questions: These types of questions
usually limit both the content and the response.
66
Conti…

Structured or restricted response type is Limit the content, & form

of responses.
e.g. explain, in not more than 200 words about the role of parenting
practice on the children’s development?
B. Non- structured or non- restricted or extended response type
provides freedom to select, organize, integrate, or evaluate ideas.
There is non- restriction.
e.g. Explain the role of parenting practice on the children’s
development?

67
Cont.….

Advantages of Essay items

Easier to prepare and administer
Induce good study habits
Students study more efficiently for essay-type examinations
than for selection
No guessing
 Measure higher order learning outcomes. Application,
analysis, synthesis, & evaluation
 Emphasize the integration and application of thinking and
problem solving skills
Improve writing skills
Cont.……

Disadvantages of Essay items

Low validity and reliability
Subjectivity of scoring
Contaminated by extraneous factors, such as spelling, hand
writing, neatness, length of the answer, halo effect(biased
judgment by previous impression)
Depends on the mood of examiner
The effect of first impression (if the first item is correctly
done)
Scoring the answers is a time consuming and tiresome task
Suggestions in Writing Essay items

Restrict the use of essay questions to those learning outcomes that

cannot be measured satisfactorily by objective items.
The wording of the question should be clear & unambiguous
For each question, specify the point value, an acceptable response-
length, and a recommended time allocation.
Employ more questions requiring shorter answers rather than
fewer questions requiring longer answers.

70
5.4 Constructing Performance Assessments
Performance-based assessments are needed to check whether
the desired learning outcomes are achieved up to the
expected standards.
For example, oral performance is required to assess a
student’s spoken communication skills in a certain language.
Similarly, the use of mathematics to solve meaningful
problems and to communicate solutions to others may also be
best assessed by the use of performance tasks in realistic
settings.

71
Conti…

Performance assessment is assessment based on observation and

judgment; we look at a performance or product and make a
judgment as to its quality. Examples include the following:
Complex performances such as playing a musical instrument,
carrying out the steps in a scientific experiment, speaking a
foreign language, reading aloud with fluency, or working
productively in a group. In these cases it is the doing—the
process—that is important.
Creating complex products such as a term paper, a lab report, or a
work of art.

72
Conti…

Performance assessments typically focus on demonstration of skills.

Examples include:
• Constructed response written examination (essays, sentence-
completion, products);
• Oral examination or presentations
• Project (individual or team)
• Written case study
• Portfolio
• Work product
• Student peer or self-evaluations
• Performance assessments can be administered to individual students or
groups of students.
73
2.6 Arrangement of test items
There are various methods of grouping items in an achievement
test depending on their purposes.
For most purposes the items can be arranged by a systematic
consideration of:
The type of items used
The learning outcomes measured
The difficulty of the items, and
The subject matter measured

74
Conti…

First, the items should be arranged in sections by item type.

That is all True-false items should be grouped together, then
matching items, then all short answer or completion items,
and then all multiple choice items.
Extended-response essay questions and performance tasks
usually take a lot of time that they would be administered
alone.
If combined with some of the other types of items and tasks,
the extended response tasks should come last.
Arranging the sections of a test in this order produces a
sequence that shows the complexity of the outcomes
measured, ranging from the simple to the complex. 75
Conti…

For this purpose, items that measure similar outcomes should

be placed together and then arranged in order of ascending
difficulty.
For example the items under the multiple choice section might
be arranged in the following order: knowledge of terms,
knowledge of specific facts, knowledge of principles, and
application of principles.
Keeping together items that measure similar learning outcomes
is especially helpful in determining the type of learning
outcomes causing students the greatest difficulty.

76
2.7 Administrating and scoring Tests and reporting results

Test Administration refers to the procedure of actually presenting

the learning task that the examinees are required to perform in
order to ascertain the degree of learning that has taken place
during the teaching-learning process.
This procedure is as important as the process of preparing the test.
This is because the validity and reliability of test scores can be
greatly reduced when test is poorly administered.
While administering test all examinees must be given fair chance
to demonstrate their achievement of the learning outcomes being
measured.

77
Conti…

This requires the provision of a physical and psychological

environment which is conducive to their making their best
efforts and the control of such factors such as malpractices
and unnecessary threat from test administrators that may
interfere with valid measurement.

78
2.7.1 Ensuring Quality in Test Administration

Quality and good control are necessary components of test

administration.
The following are guidelines and steps involved in test
administration aimed at ensuring quality in test administration.
Collection of the question papers in time to be able to start the
test at the appropriate time.
Ensure compliance (respect) with the stipulated sitting
arrangements in the test to prevent collision (conflict) between or
among the test takers.
Ensure orderly and proper distribution of questions papers to the
test takers.
79
Conti…

Do not talk unnecessarily before the test.

Test takers’ time should not be wasted at the beginning of the test
with unnecessary remarks, instructions or threat that may develop
test anxiety.
It is necessary to remind the test takers of the need to avoid
malpractices before they start and make it clear that cheating will
be penalized.
Stick to the instructions regarding the conduct of the test and
avoid giving hints to test takers who ask about particular items.
But make corrections or clarifications to the test takers whenever
necessary.
80
2.7.2 Credibility and Civility in Test Administration
Credibility deals with the value the eventual recipients and users
of the results of assessment place on the result with respect to the
grades obtained, certificates issued or the issuing institution.
While civility on the other hand enquires whether the persons
being assessed are in such conditions as to give their best without
hindrances and burdens in the attributes being assessed and
whether the exercise is seen as integral to or as external to the
learning process.
Or Civility means treating all test-takers with respect, fairness,
and dignity, which helps reduce anxiety and creates a supportive
testing environment.
81
Conti…

Hence, in test administration, effort should be made to see that

the test takers are given a fair and unaided chance to demonstrate
what they have learnt with respect to:
A. Instructions: Test should contain a set of instructions which
are usually of two types.
One is the instruction to the test administrator while the other
one is to the test taker.
The instruction should explain how the test should be performed.

82
Conti…

B. Duration of the Test: Ample time should be provided for

candidates to demonstrate what they know and what they can do.
The duration of test should reflect the age and attention span of
the test takers and the purpose of the test.
c. Venue and Sitting Arrangement:
It is important to provide enough and comfortable seats with
adequate sitting arrangement for the test takers’ comfort and to
reduce collaboration between them.
Adequate lighting, good ventilation and moderate temperature
reduce test anxiety and loss of concentration which invariably
affects performance in the test.
83
2.7.3 Scoring tests

There are two common methods of scoring essay questions.

i. The point or analytic method: In this method each answer is
compared with already prepared ideal marking scheme (scoring
key) and marks are assigned according to the adequacy of the
answer.
When used carefully, the analytic method provides a means for
maintaining uniformity in scoring between scorers and between
scripts thus improving the reliability of the scoring.

84
Conti…

ii. The global/holistic of rating method: In this method the

examiner first sorts the response into three or more categories of
varying quality based on his general or global impression on
reading the responses.
The standard of quality helps to establish a relative scale, which
forms the basis for ranking responses from those with the poorest
quality response to those that have the highest quality response.
When the scorer is completely satisfied that each response is in
its proper category, it is marked accordingly.

85
Conti…

The following guidelines would be helpful in making the scoring

of essay items easier and more reliable.
You should ensure that you are firm emotionally, mentally etc
before scoring
All responses to one item should be scored before moving to the
next item
Write out in advance a model answer to guide yourself in grading
the students’ answers
Shuffle exam papers after scoring every question before moving
to the next
The names of test takers should not be known while scoring to
avoid bias
86
2.7.4 Reporting Assessment Results

School grades and progress reports serve various functions in the

school.
They provide information that is helpful to students, parents and
school personnel.
Obviously in our country’s education system, we use numeric
grades to report students’ performance at secondary school level.
For example, we may give marks to summarize students’ overall
performance.
At the same time we may hold conferences with parents to report
a qualitative description of students’ progress in their learning.

87
UNIT THREE:
Describing and Interpreting Test Scores
3.1. Introduction
In this unit we will see the idea of test score interpretation and the
major statistical techniques that can be used to interpret test
scores.
Particularly, the methods of interpreting test scores, includes
measures of central tendency, measures of dispersion or
variability, measures of relative position, and measures of
relationship or association.

88
3.2 Describing and interpreting test results

Test interpretation is a process of assigning meaning and

usefulness to the scores obtained from classroom test.
Kinds of scores
The most common kinds of scores include nominal, ordinal,
interval, and ratio scales.

89
Conti…

1. Nominal scale-when interpreting test scores using a nominal

scale, data is categorized in to distinct categories with out any
specific order or ranking.
For example, we may assign the number 1 for males and 2 for
females, pass or fail, high, medium, or low etc…
These categories have no inherent value and are solely used for
grouping and classification purpose.

90
Conti…

2. Ordinal scale with an ordinal scale, test scores are ranked or

ordered based on their relative standing.
For example, ranking students based on their performance on
a certain athletic event or assigned a grade level such as A, B
C, or D.
We know who is best, second best, third best, etc.
But the ranked do not tell us anything about the difference
between the scores or the actual numerical value or distance
between scores.

91
Conti…

3.Interval data when interpreting test scores using an

interval scale, the numerical values (score) assigned to
scores represents equal interval or distance between
scores (values).This allows for meaningful comparison
and calculation such as determining the difference
between two scores.
If, on a test with interval data, Almaz has a score of 60,
Abebe a score of 50, and Beshadu a score of 30, we could
say that the distance between Abebe’s and Beshadu’s
scores (50 to 30) is twice the distance between Almaz”s
and Abebe’s scores (60 t0 50).
92
Conti…

4. Ratio scale, the ratio of the scores has meaning.

Ratio scales allow individuals to make meaningful
comparison between scores by considering the magnitude of
the difference.
They provide a true zero point. Ratio scales have a true zero
point meaning that a score of zero indicates the absence of
the characteristics being measured.
There is a meaningful zero point. However, if a student
scored 0 on a spelling test, we would not interpret the score
to mean that the student had no spelling ability.

93
3.2.2 Measures of Central Tendency

The goal of the measures of central tendency is to provide

valuable information about the distribution of test scores.
There are three basic measures of central tendency – mean,
mode and median.

94
The Mean

The mean, or arithmetic average, is the most widely used

measure of central tendency.
It is the average of a set of scores computed simply by adding
together all scores and dividing by the number of scores.
Mean gives the general performance of the test-takers.
Here is an example of test scores for a Math’s class: 82, 93, 86,
97, 82.
To find the Mean, first you must add up all of the numbers.
(82+93+86+97+82= 433) Now, since there are 5 test scores,
divide the sum by 5. (440÷5= 88).
Thus, the Mean is 88.
95
Conti…

The formula used to compute the mean is as follows:

Where, X = Mean
∑ = the sum of
X = any score
N = Number of scores

96
The Median

The median is the middle value in a data set when the scores are
arranged in ascending order.
It is the number that divides a distribution of scores exactly in
half.
When the number of scores is odd, the median is the middle
score.
If the number of scores is even, the median will be halfway
between the two middle most scores.

97
Example 1 Example 2 Example 3 Example 4
Scores Scores Scores Scores
50 50 49 50
48 49 48 49
48 48 48 47
47 46 47 47
45 46 45 45
44 43 44 45
43 43 43 45
42 42 42 44
42 41 42 42
41 41 41 41
38 41 98
Conti….

In example 1, our line would be between 44 and 45, so the median

would be halfway between them at 44.5.
In this case the median is not an actual score earned by one of the
students.
In example 2, the distance between the two middle scores (43 and
46) is more than one, so we again find the point halfway between
them for our median of 44.5.
If the number of students is uneven, the median is the one score that
is the middle score in the frequency distribution, having equal
numbers of scores above and below it.
Thus, the median is 44 in example 3, and 45 in example 4. It does
not matter if more than one student earns that score, as in example 4.
99
The Mode

The mode is the score (value) that occur most frequently in a data
set.
It provides information about common scores or patterns in the
data.
The mode can be useful for identifying popular response choices or
recurring themes in test scores.
For example, the following test scores, 7, 7, 7, 20, 23, 23, 24, 25,
26 have a mode of 7.
In general, by considering mean, median and mode together, test
score interpreters can gain a comprehensive understanding of the
distribution of scores and make informed decision about the
performance of test-takers. 100
3.2.3 Measures of Variability/Dispersion
The measures of central tendency focus on what is typical,
average or in the middle of a distribution.
Knowing the mean, the median or the mode (or all of these)
of a distribution does not allow us to differentiate between
distributions.
But measure of variability represent quite different spreads of
performance and also identify such differences by numbers
that indicate how much scores spread out in a group.
The three most commonly used measures of variability are the
range, the inter quartile range, and the standard deviation.
101
The Range

It is the simplest measure of variability calculated by subtracting

the lowest score from the highest score.
It provides a quick and easy way to understand the spread of
scores. E.g. if a test has a range of 40 points, this indicates that
there is a wide variability in scoring among students.
For example, if the score of 10 students in a certain test is: 5, 7,
8, 10, 12, 13, 14, 15, 17, 19, then the range will be 19 -5 = 14.

102
Inter quartile range

Inter quartile range (IQR) is another range measure and the data
put in terms of quarters or percentiles and used to measure of the
spread of scores.
It is calculated as the difference between the upper quarter
(Q3)and lower quarter (Q1)of the data set.
E.g. if the interquartile range of a test is 15 points, this means that
the middle 50% of scores fall with in a 15-point range.
IQR is the distance between the 25 th and 75th percentile or the first
and third quarter.
The range of data is divided into four equal percentiles or quarters
(25%).
IQR is the range of the middle 50% of the data. 103
The Standard Deviation

It is essentially an average of the degree to which a set of

scores deviates from the mean.
If the Standard Deviation is large, it means the numbers are
spread out from their mean.
If the Standard Deviation is small, it means the numbers are
close to their mean.
Because it takes into account the amount that each score
deviates from the mean, it is a more stable measure of
variability than either the range or quartile deviation.

104
Conti…

The procedure for calculating a standard deviation involves the

following steps:
Compute the mean.
Subtract the mean from each individual’s score.
Square each of these individual scores.
Find the sum of the squared scores (∑X 2).
Divide the sum obtained in step 4 by N, the number of students,
to get the variance.
Find the square root of the result of step 5. This number is the
standard deviation (SD) of the scores.

105
Conti…

Thus the formula for the standard deviation (SD) is: SD=

The individual scores of group A is: 72, 76, 80, 80, 81, 83, 84,
85, 85, & 89. The individual scores of group B is: 57, 63, 65,
71, 83, 93, 94, 95, 96, 98. Let us start with group A.
So, the first step to finding the Standard Deviation is to find all
the distances from the mean.
This will be followed by squaring each distances which will
give us the following results.

106
Conti…

Score of team A Distance from the mean Distances squared

72
- 9.5 90.25
76
- 5.5 30.25
80
- 1.5 2.25
80
- 1.5 2.25
81
- 0.5 0.25
83
1.5 2.25
84
2.5 6.25
85
3.5 12.25
85
3.5 12.25
89
7.5 56.25
107
Conti…

Then we add up all of the squared distances which will gives us

214.5
This will be divided by the total number of scores of the group
which will result 214.5 /10 = 21.45. This is the variance of the
data set.
Variance is the average squared deviation from the mean of a set
of data.
It is used to find the standard deviation. Finally, we calculate
the Square Root of the variance.
This will give us 4.63, which is the standard deviation.

108
Conti…

GROUP A GROUP B

Average on the
quiz 81.5 81.5

Standard deviation
4.63 15.1

109
3.2.4. Measures of Relative Position

There are different ways to measure the relative position of

scores.
By comparing an individuals test score to the scores of their
peers or a larger reference group, it allows how well or
poorly they performed on the test in relation to others.
Suppose that you have scored 55 on a test. What do you say
about this score?
On the surface it might look bad but what if that was the
highest in the class or if that score was better than 80% of the
class? This is what we mean by relative position.
110
Percentiles
A percentile is a score that indicates the rank of the student
compared to others (same age or same grade), using a
hypothetical group of 100 students.
It tells you what percentage of people you did better than.
A percentile of 25 (25th percentile), for example, indicates
that the student's test performance equals or exceeds 25 out
of 100 students on the same measure.
A percentile of 87 indicates that the student equals or
surpasses 87 out of 100 (or 87% of) students.
A percentile must always refer to a student’s percentile rank
as relative to a particular norm group. 111
Percentile= number of scores below the given score/total
number of scores *100
e.g. Suppose there are 50 students who took a math test,
and you scored 75 out of 100 on a test. Find out your
percentile score? Lets say that 30 students scored below
you on the test.
Percentile= (30/50)*100
Percentile= (0.6)*100
Percentile=60

112
Quartiles
Quartile is another term referred to in percentile measure.
The total of 100% is broken into four equal parts: 25%, 50%,
75% 100%.
Lower Quartile is the 25th percentile. (0.25)
Median Quartile is the 50th percentile. (0.50)
Upper Quartile is the 75th percentile. (0.75)

113
Standard Scores

Another method of indicating a students relative position in

a group is by showing how far the raw score is above or
below average.
Basically, standard scores express test performance in terms
of standard deviation units from the mean.
Standard scores are scores that are based on mean and
standard deviation.

114
Types of standard scores

Z-scores are measures of how many standard deviations a

particular score is above or below the mean of a distribution.
e.g. a Z score of -1.5 would indicate that a score is 1.5 standard
deviations below the mean.
We define z score as z = X – X,
S
Where, X = the data value in question
X = the sample mean
s = the sample standard deviation

115
Conti…

For instance, if a person scored a 70 on a test with a mean of 50

and a standard deviation of 10, then they scored 2 standard
deviations above the mean.
So, a z score of 2 means the original score was 2 standard
deviations above the mean.
If the z-score is 0 then your data value is the mean
If the z-score > 0 (positive) then your data value is above the
mean
If the z-score < 0 (negative) then your data value is the below the
mean.

116
Conti…

Example. Almaz scored a 25 on her math test.

Suppose the mean for this exam is 21, with a standard deviation of
4.
Dawit scored 60 on an English test which had a mean of 50 with a
standard deviation of 5. Who did relatively better?
we will find the respective z-scores for Almaz and Dawit.
Almaz= z-score: 25 - 21 =1
4
Dawit's z-score: 60-50 = 2
5
Since Dawit had a higher z-score, we say Dawit did relatively
better. 117
Conti…

T Scores: This refers to any set of normally distributed

standard scores that has a mean score of 50 and a standard
deviation of 10.
It is useful making the Z-score conventional and clear.
They are commonly used to compare an individuals
performance to a larger group.
The T – score is obtained by multiplying the Z-score by 10
and adding the product to 50.
That is, T – Score = 50 + 10(z).
A score of 60 is one standard deviation above the mean, while
a score of 30 is two standard deviations below the mean. 118
Conti…

Example
A test has a mean score of 40 and a standard deviation of 4. What are
the T – scores of two test takers who obtained raw scores of 30 and
45 respectively in the test?
Solution
The first step in finding the T-scores is to obtain the z-scores for the
test takers.
The z-scores would then be converted to the T – scores.
In the example above, the z – scores are:
For the test taker with raw score of 30, the Z – score is:
Z – Score = X – M, where the symbols retain their usual meanings.
SD 119
Conti…

X = 30, M = 40, SD = 4.
Thus, Z – Score = 30 - 40 = -10 = -2∙5
4 4
The T - Score is then obtained by converting the Z – Score (-2∙5)
to T – score. Thus:
T – Score = 50 + 10 (z)
= 50 + 10 (-2∙5)
= 50 – 25
= 25

120
3.2.5 Measures of Relationship
If we have two sets of scores from the same group of people, it is
often desirable to know the degree to which the scores are related.
For example, the relationship between the test scores of students
for the English Subject and their overall scores of other subjects.
The degree of relationship is expressed in terms of coefficient of
correlation.
The value ranges from -1.00 to +1.00.
A perfect positive correlation is indicated by a coefficient of
+1.00 and a perfect negative correlation by a coefficient of -1.00.

121
Conti…

A correlation of .00 indicates no relationship between the two

sets of scores.
Obviously, the larger the coefficient (positive of negative), the
higher the degree of relationship expressed.
There are several different measures of relationship expressed
as correlation coefficients.
One of these is the product-moment correlation coefficient,
which is by far the most commonly used and most useful
correlation coefficient.
It is indicated by the symbol r.
122
Conti…

The formula for obtaining the coefficient of correlation is:

Where, X = score of person on one variable

Y = score of same person on the other variable
= mean of the X distribution
= mean of the Y distribution
Sx = standard deviation of the X scores
Sy = standard deviation of the Y scores
N = number of pairs of scores
123
3.3 Characteristics of a good test

Validity It refers to when a test serves its purpose(s), that is

measures what it intended to measure and to the extent
desired.
It is all about the extent to which assessment information can
be trusted (truthfulness).

124
Conti…

The following factors can influence the validity of a test:

• Unclear direction
• inappropriate level of difficulty
• Poorly constructed items ( clues to items)
• Test item inappropriate for the objectives being measured
• Improper arrangement of item
• Cheating in exams, emotional disturbance of examines

125
Conti…

Reliability: Test reliability refers to the accuracy, consistency

and stability of scores students would receive on alternate
forms of the same test.
The more consistent our test results are from one measurement
to another, the less error there will be and consequently, the
greater the reliability.

126
There are some factors the reliability of a test which include
the following:
• Test length: the longer a test is, the more reliable it is ( in that
wide coverage of content is ensured) but NOT TOO LONG
• Sample heterogeneity: the more heterogeneous the test
items, the higher the reliability
• Irregularities: lightening conditions, testee’s failure to follow
directions,

127
Conti…

• Objectivity- The fairness of a test to the testee, bias test does

not portray objectivity and hence is not reliable.
• A test that is objective has high validity and reliability
• Discrimination- A good test must be able to make distinction
between poor and good learner; it should show the slight
differences between learner attainment and achievement that
will make it possible to distinguish between poor and good
learner.

128
Conti…

• Comprehensiveness- Test items that covers much of the content

of the course, that is the subject.
• Ease of administration- a good test should not pose difficulties in
administration.
• Practicality and scoring- Assigning quantitative value to a test
result should not be difficult. Why, what and how.
• Usability- a good test should be useable, unambiguous and clearly
stated with one meaning only.

129
Unit Four: Item Analysis

4.2. Analyzing test items

It is the process involved in examining or analyzing testee responses
to each item on a test with a basic intent of judging the quality of item.
Item analysis is the process of studying the characteristics of a test
items based on data obtained from examinees.
Item analysis indicates which item is difficult or easy, which item
effectively discriminates between high and low achievers and whether
the item functions as it was intended or not.
If an item is too easy, too difficult, failing to show a difference
between skilled and unskilled examinees.
The two most common statistics reported in an item analysis are the
item difficulty and the item discrimination.
130
4.2.1. Item difficulty level index

It is a measure of the proportion of examinees who answered the item correctly.

Item difficulty refers to how easy or hard a test question is for a group of
examinees. It is usually measured by the proportion of students who answered
the item correctly.
P= the symbol of item difficulty index.
The p-value ranges from 0 to 1:
• Closer to 1 → easy item
• Closer to 0 → difficult item
• Ideal item difficulty for most classroom tests is between 0.3 and 0.7.
When there is a sufficient number of scores available (i.e., 100 or more)
difficulty indexes are calculated using scores from the top and bottom 27
percent of the group.
When we calculate item difficulty level, first rank the papers in
order from the highest to the lowest score.
131
Conti…

An item difficulty level is determined by:

P= CRU+CRL*100
NU+NL
where, P = difficulty index
CRu = number of correct responses from upper group
which is 27% of the total respondents.
CRL = number of correct responses from lower group
(27% of the total respondents).
Nu = number of respondents in the upper group
NL = number of respondents in the lower group

132
P= success in HSG+ success in LSG
---------------------------------------
HSG +LSG
P= the symbol of item difficulty index
Item difficulty index ranges between 0.0 and 1.0 and is expressed as a
percentage
Conti….

The difficulty indexes can range between 0.0 and 1.0 and
are usually expressed as a percentage.
A higher value indicates that a greater proportion of
examinees responded to the item correctly, and it was thus
an easier item.
For maximum discrimination among students, an average
difficulty of .60 is ideal.
An item difficulty of 1.00 indicates that everyone
answered correctly, while 0.00 means no one answered
correctly.
134
Item difficulty interpretation

P-Value Percent Range Interpretation

> or = 0.75 75-100 Easy

< or = 0.25 0-25 Difficult

between .25 & .75 26-74 Average

135
Example: Determination of Item difficulty

Options Upper 27% Lower 27% Total

A 1 4 5
B** 23 17 40
C 1 3 4
D 2 3 5
Omit 0 0 0
Total 27 27 58
From the above data,
CRu = 23 Nu = 27
CRL = 17 NL = 27
23  17 40
100  100 0.74 100 74%
Then, P = 27  27 54

Comment: The item is somewhat easy and appropriate for

classroom achievement test. 136
Activity: Calculate the item difficulty level for the following four
options multiple choice test item. (The sign (*) shows the correct
answer).

Response option TOTAL

Groups A B C D*
10
High Scorers 0 1 1 8
10
Low Scorers 1 1 5 3
20
Total 1 2 6 11

137
4.2.2. Item discrimination index

The index of discrimination is a numerical indicator that enables

us to determine whether the question discriminates appropriately
between lower scoring and higher scoring students.
Item discrimination refers to how well a test item differentiates
between high-performing and low-performing students.
D-value ranges from -1 to +1:
An item discriminates in a positive direction if more students in
the upper group than the lower group get the item right.
Positive discrimination index indicates that the item functions as
it is intended.

138
Conti…

D= CRU-CRL or D= CRU-CRL
NU NL
From the preceding data
CRU=23 CRL=17 NU=27 NL=27
Therefore D= 23-17 D= 6 = 0.22
27 27
Comment: - It is a marginal that needs some
improvement

139
Conti…

Item discrimination index can be calculated using the following

formula:
D=Success in the HSG-Success in the LSG
½ (HSG+LSG)
Where, HSG = High Scoring Groups
LSG = Low Scoring Groups

140
An item will have a maximum positive discriminating power if all
students from the upper group get the item right and all from the lower
group miss it.
27  0
That is, D 1
27

An item will have no discriminating power if all students both from the
upper and lower groups get the27item
 27
right or miss it. 0 0
That is, D  or 0 D 0
27 27

An item will have negative discriminating power if more students from

the lower group than the upper group get the item right. Such items
should be revised so that they discriminate positively or discarded.
Moreover, items answered correctly or incorrectly by all examinees can’t
discriminate at all and should be revised so that they discriminate or
141
discarded.
Conti…

The item discrimination index can vary from -1.00 to +1.00.

A negative discrimination index (between -1.00 and zero)
results when more students in the low group answered correctly
than students in the high group.
A discrimination index of zero means equal numbers of high and
low students answered correctly, so the item did not discriminate
between groups.
A positive index occurs when more students in the high group
answer correctly than the low group.
If the students in the class are fairly homogeneous in ability and
achievement, their test performance is also likely to be similar,
resulting in little discrimination between high and low groups.
142
Generally, indices of item discrimination can be evaluated on
the following terms as suggested by Ebel (1972)
Table: - Discrimination Indices for Item Evaluation
Index of
Discrimination

0.40 and up Very good

Good
0.30 – 0.39
Marginal that needs improvement
0.20 – 0.29
Poor items (Should be discarded)
Below 0.19

Therefore, very good classroom test items should have indices of

discrimination of 0.40 or above. Poor items to be rejected or improved by
revision are those which have indices of discrimination below 0.19. 143
Item discrimination interpretation

D-Value Direction Strength

> +.40 positive strong

+.20 to +.40 positive moderate

.1to +.20 marginal Needs improvement

<.1 Poor items Should be discarded

144
4.3. Evaluating the Effectiveness of Distracters

 Distracters should be as attractive as that of the correct answer.

 In a properly constructed multiple-choice items, each distracter
will be selected by some students. Specifically, it should attract
more students from the lower group than from the upper group.
 If a distracter is not selected by any one, it makes no contribution
to the functioning of the item and should be eliminated or revised.
 A distractor analysis evaluates the effectiveness of the
distracters in each item by comparing the number of students
in the upper and lower groups who selected each incorrect
alternative (a good distracter will attract more students from
the lower group than the upper group).
145
Example: Evaluating the Effectiveness of Distracters
Options Upper 27% Lower 27% Total
A
A** 12 12 7 7 19 19
B
B 12 10 22
C 0 0 0
D 3 10 13
Omit 0 0 0

CRU  CRL
Difficulty Index (P) = NU  N L
100

12  7 19
P 100  100 0.352 100 35.2%
27  27 54
CRU  CRL
Discriminating Power (D) = NU
12  7 5
D  0.185 0.19
27 27 146
Comment:
1. The item discriminates in a positive direction since 12 in the
upper group and 7 in the lower group got the item right, index
of discriminating power (D) is low.
2. The item is difficult for classroom achievement test.
3. Alternative “B” is a poor distracter because it attracts more
students from the upper group than from the lower group.
4. Alternative “C” is inefficient since it attracted no one.
5. Alternative “D” is functioning as intended for it attracted more
students from the lower group than from the upper group
6. The discriminating power of the item may be improved by
removing ambiguity in the statement of the item and replacing
alternative “B” and “C” 147
Unit Five: Ethical Standards of Assessment

5.2. Ethical and Professional Standards of Assessment and its

Use
Ethical standards guide teachers in fulfilling their
professional obligation to provide and use tests that are fair
to all test takers regardless of age, gender, disability,
ethnicity, religion, linguistic background, or other personal
characteristics.
Teachers should be fair in different aspects of assessment.

148
Conti…

The following are some ethical standards that teachers may

consider in their assessment practices.
Teachers should be skilled in choosing assessment methods
that enable them to make appropriate for instructional
decisions.
Teachers need to be well-acquainted with the kinds of
information provided by a broad range of assessment
alternatives and their strengths and weaknesses.
In particular, they should be familiar with criteria for
evaluating and selecting assessment methods in light of
instructional plans. 149
Conti…

Teachers should develop tests that meet the intended

purpose and that are appropriate for the intended test
takers.
The teacher should be skilled in administering, scoring and
interpreting the results from diverse assessment methods.
It is not enough that teachers are able to select and develop
good assessment methods; they must also be able to apply
them properly.
Teachers should be skilled in communicating assessment
results to students, parents, and other educators.
150
5.3. Ethnicity and Culture in tests and assessments

Students represent a variety of cultural and linguistic

backgrounds. If the cultural and linguistic backgrounds are
ignored, students may become alienated or disengaged from the
learning and assessment process.
Teachers need to be aware of how such backgrounds may
influence student performance and the potential impact on
learning.
Classroom assessment practices should be sensitive to the
cultural and linguistic diversity of students in order to obtain
accurate information about their learning.
Assessment practices that attend to issues of cultural diversity
include those that 151
Conti…

Acknowledge students’ cultural backgrounds.

are sensitive to those aspects of an assessment that may
hamper students’ ability to demonstrate their knowledge and
understanding.
use that knowledge to adjust or scaffold assessment practices
if necessary.
Assessment practices that attend to issues of linguistic
diversity include those that
Acknowledge students’ differing linguistic abilities.

152
Conti…

use that knowledge to adjust or scaffold assessment practices

if necessary.
use assessment practices in which the language demands do
not unfairly prevent the students from understanding what is
expected of them.
Teachers must make every effort to address and minimize the
effect of bias in classroom assessment practices.

153
Conti…

Assessment should be culturally and linguistically

appropriate, fair and bias-free.
For an assessment task to be fair, its content, context, and
performance expectations should:
reflect knowledge, values, and experiences that are equally
familiar and appropriate to all students;
tap knowledge and skills that all students have had adequate
time to acquire;
be as free as possible of cultural and ethnic stereotypes.

154
5.4. Disability and Assessment Practices
Inclusive education is based on the idea that all students,
including those with disabilities, should be provided with the
best possible education to develop themselves.
This implies for the provision of all possible accommodations
to address the educational needs of disabled students.
Accommodations should not only refer to the teaching and
learning process.
It should also consider the assessment mechanisms and
procedures.

155
Conti…

There are different strategies that can be considered to make

assessment practices accessible to students with disabilities
depending on the type of disability.
In general terms, however, the following strategies could be
considered in summative assessments:
Modifying assessments: - This should enable disabled
students to have full access to the assessment without giving
them any unfair advantage.

156
Conti…

Others’ support: - Disabled students may need the support of

others in certain assessment activities which they can not do it
independently.
For instance, they may require readers and scribes in written
exams; they may also need others’ assistance in practical activities,
such as using equipments, locating materials, drawing and
measuring.
Time allowances: - Disabled students should be given additional
time to complete their assessments which the individual instructor
has to decide based on the purpose and nature of the assessment.
Rest breaks: Some students may need rest breaks during the
examination. This may be to relieve pain or to attend to personal
needs. 157
Conti…

Flexible schedules: In some cases disabled students may require

flexibility in the scheduling of examinations. For example, some
students may find it difficult to manage a number of examinations
in quick succession and need to have examinations scheduled over a
period of days.
Alternative methods of assessment:- In certain situations where
formal methods of assessment may not be appropriate for disabled
students, the instructor should assess them using non formal
methods such as class works, portfolios, oral presentations, etc.
Assistive Technology: Specific equipment may need to be available
to the student in an examination. Such arrangements often include
the use of personal computers, voice activated software and screen
readers. 158
5.5 Gender issues in assessment

Teachers’ assessment practices can also be affected by gender

stereotypes.
The issues of gender bias and fairness in assessment are
concerned with differences in opportunities for boys and girls.
A test is biased if boys and girls with the same ability levels
tend to obtain different scores.
Test questions should be checked for:
Material or references that may be offensive to members of
one gender,
References to objects and ideas that are likely to be more
familiar to men or to women
159
Conti…

unequal representation of men and women as actors in test

items or representation of members of each gender only in
stereotyped roles.
If the questions involve objects and ideas that are more
familiar or less offensive to members of one gender, then the
test may be easier for individuals of that gender.
Standards for achievement on such a test may be unfair to
individuals of the gender that is less familiar with or more
offended by the objects and ideas discussed, because it may
be more difficult for such individuals to demonstrate their
abilities or their knowledge of the material.
160
161

CPT4 Session 1
No ratings yet
CPT4 Session 1
7 pages
Edu 204 Objectives P
No ratings yet
Edu 204 Objectives P
173 pages
Assessment and Evalaution
No ratings yet
Assessment and Evalaution
55 pages
Basic Concepts in Educational Assessment
No ratings yet
Basic Concepts in Educational Assessment
34 pages
Understanding Classroom Assessment Basics
No ratings yet
Understanding Classroom Assessment Basics
27 pages
Understanding Educational Assessment
No ratings yet
Understanding Educational Assessment
17 pages
Assessment Methods and Principles
No ratings yet
Assessment Methods and Principles
7 pages
Learning Outcomes and Assessment Strategies
No ratings yet
Learning Outcomes and Assessment Strategies
50 pages
Understanding Assessment in Education
No ratings yet
Understanding Assessment in Education
9 pages
Understanding Educational Assessment Methods
No ratings yet
Understanding Educational Assessment Methods
34 pages
Education Assessment and Evaluation Notes
100% (2)
Education Assessment and Evaluation Notes
5 pages
Assessment of Learning 1 101412
No ratings yet
Assessment of Learning 1 101412
217 pages
Assessment in Learning 1-Reviewer
100% (1)
Assessment in Learning 1-Reviewer
6 pages
Understanding Educational Assessment Basics
No ratings yet
Understanding Educational Assessment Basics
4 pages
Understanding Assessment and Evaluation
No ratings yet
Understanding Assessment and Evaluation
221 pages
Assessment and Evaluation of Learning Assement (PGDT 423) Handout-1
No ratings yet
Assessment and Evaluation of Learning Assement (PGDT 423) Handout-1
85 pages
PGDT 4231-230916095416-67b04 Best For PGDT Courses
100% (1)
PGDT 4231-230916095416-67b04 Best For PGDT Courses
169 pages
Educational Assessment Insights and Trends
No ratings yet
Educational Assessment Insights and Trends
30 pages
PC 314 (g1) Reporting
No ratings yet
PC 314 (g1) Reporting
60 pages
Difference - Assessment and Evaluation
No ratings yet
Difference - Assessment and Evaluation
75 pages
Understanding Assessment in Education
No ratings yet
Understanding Assessment in Education
169 pages
Assessment of Learning I
95% (60)
Assessment of Learning I
186 pages
Special Education Assessment Guide
100% (1)
Special Education Assessment Guide
16 pages
Principles and Types of Educational Assessment
No ratings yet
Principles and Types of Educational Assessment
4 pages
Student Learning Assessment Strategies
No ratings yet
Student Learning Assessment Strategies
6 pages
Understanding Assessment in Education
No ratings yet
Understanding Assessment in Education
16 pages
Educators' Guide to Assessment
No ratings yet
Educators' Guide to Assessment
42 pages
Measurement, Assessment, Evaluation in Education
No ratings yet
Measurement, Assessment, Evaluation in Education
21 pages
Assessment, Measurement, and Evaluation
No ratings yet
Assessment, Measurement, and Evaluation
26 pages
Understanding Assessment in Education
No ratings yet
Understanding Assessment in Education
12 pages
Understanding Evaluation and Assessment
No ratings yet
Understanding Evaluation and Assessment
21 pages
Understanding Measurement and Assessment in Education
No ratings yet
Understanding Measurement and Assessment in Education
26 pages
Unit One 8602
100% (3)
Unit One 8602
25 pages
Assessment Techniques for Learning Success
100% (1)
Assessment Techniques for Learning Success
39 pages
Key Concepts in Educational Assessment
No ratings yet
Key Concepts in Educational Assessment
7 pages
Assessment and Evaluation Methods Guide
No ratings yet
Assessment and Evaluation Methods Guide
16 pages
Basic Concepts in Assessment - Notes
71% (17)
Basic Concepts in Assessment - Notes
6 pages
ASL 1 Determining Progress Towards The Attainment of The Learning
67% (3)
ASL 1 Determining Progress Towards The Attainment of The Learning
18 pages
Educational Assessment Principles Guide
No ratings yet
Educational Assessment Principles Guide
14 pages
Ped 4
No ratings yet
Ped 4
194 pages
Assessment and Evaluation of Learning
No ratings yet
Assessment and Evaluation of Learning
56 pages
Chapter 1. Basic Concepts in Assessment
100% (1)
Chapter 1. Basic Concepts in Assessment
16 pages
Learning Assessment Concepts and Methods
100% (1)
Learning Assessment Concepts and Methods
17 pages
Understanding Classroom Assessment Concepts
No ratings yet
Understanding Classroom Assessment Concepts
26 pages
Understanding Measurement, Assessment, Evaluation
100% (1)
Understanding Measurement, Assessment, Evaluation
25 pages
Educ 108 Chapter 1
No ratings yet
Educ 108 Chapter 1
50 pages
Understanding Assessment and Evaluation
No ratings yet
Understanding Assessment and Evaluation
25 pages
Assessment of Learning 1 - Gabuyo
No ratings yet
Assessment of Learning 1 - Gabuyo
88 pages
Cit - Educ106 Updated
No ratings yet
Cit - Educ106 Updated
9 pages
Explain Any Three Basic Concepts Related To Assessment.: Unit 1 10 Mark Questions
No ratings yet
Explain Any Three Basic Concepts Related To Assessment.: Unit 1 10 Mark Questions
34 pages
Assessment of Learning I
No ratings yet
Assessment of Learning I
189 pages
Basics of Student Assessment
100% (1)
Basics of Student Assessment
189 pages
CPE 105 Chapter 1
No ratings yet
CPE 105 Chapter 1
9 pages
Principles of Learning Assessment
No ratings yet
Principles of Learning Assessment
4 pages
Chapter 2 Determining Progress Towards The Attainment of Learning Outcomes
No ratings yet
Chapter 2 Determining Progress Towards The Attainment of Learning Outcomes
7 pages
Understanding Assessment Concepts
No ratings yet
Understanding Assessment Concepts
105 pages
Inbound 5109028472332380655
No ratings yet
Inbound 5109028472332380655
22 pages
Ail 1
No ratings yet
Ail 1
4 pages
Business Research Methods Syllabus
No ratings yet
Business Research Methods Syllabus
3 pages
Research Methods Lecture 4 Objectives
No ratings yet
Research Methods Lecture 4 Objectives
26 pages
MBA Management Course Syllabus 2024-25
No ratings yet
MBA Management Course Syllabus 2024-25
13 pages
Research Methdology
No ratings yet
Research Methdology
30 pages
VTU-mba New Syllabus
No ratings yet
VTU-mba New Syllabus
186 pages
Statistical Analysis With Software Application-Ppt
No ratings yet
Statistical Analysis With Software Application-Ppt
57 pages
UNIT-I Study Guide - Introduction To Business Stati
No ratings yet
UNIT-I Study Guide - Introduction To Business Stati
6 pages
Level of Measurement
No ratings yet
Level of Measurement
29 pages
20IT503 - Big Data Analytics - Unit2
No ratings yet
20IT503 - Big Data Analytics - Unit2
62 pages
Fed 313 Assessment and Measurement Guide
No ratings yet
Fed 313 Assessment and Measurement Guide
38 pages
Statistical Data Analysis Techniques
No ratings yet
Statistical Data Analysis Techniques
150 pages
Software Measurement & Scales
No ratings yet
Software Measurement & Scales
85 pages
AIOU Solved Assignments 8614 Spring 2019
100% (1)
AIOU Solved Assignments 8614 Spring 2019
11 pages
A Study To Evaluate Users Satisfaction of Blackboard Learn
0% (1)
A Study To Evaluate Users Satisfaction of Blackboard Learn
18 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
60 pages
PR2 Topic 4 - Variables and Scales of Measurement
No ratings yet
PR2 Topic 4 - Variables and Scales of Measurement
23 pages
Meaningful Environmental Indices Explained
No ratings yet
Meaningful Environmental Indices Explained
14 pages
CM5 - Mathematics As A Tool
No ratings yet
CM5 - Mathematics As A Tool
17 pages
Research Methods For MSC MPH
No ratings yet
Research Methods For MSC MPH
81 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
5 pages
II Sem Syallbus
No ratings yet
II Sem Syallbus
109 pages
3is Q2 Module 1.2
No ratings yet
3is Q2 Module 1.2
18 pages
Introduction to Quantitative Research
No ratings yet
Introduction to Quantitative Research
23 pages
Second Quarter Exam Research120
100% (2)
Second Quarter Exam Research120
30 pages
Research 8 Grade 8 Melc 1 q4 Week1
No ratings yet
Research 8 Grade 8 Melc 1 q4 Week1
26 pages
1 - Chapter # 1 Data Types
No ratings yet
1 - Chapter # 1 Data Types
15 pages
Data Types and Measurement Levels
100% (1)
Data Types and Measurement Levels
13 pages
Introduction To SPSS
No ratings yet
Introduction To SPSS
17 pages
Business Analysis Project
No ratings yet
Business Analysis Project
19 pages
Understanding Psychological Statistics
No ratings yet
Understanding Psychological Statistics
34 pages