You are on page 1of 52

[Document title]

[Document subtitle]

Abstract
[Draw your reader in with an engaging abstract. It is typically a short summary of the document.
When you’re ready to add your content, just click here and start typing.]

Firdauz
[Email address]
lOMoARcPSD|11364589

Introduction:

This course focuses on the principles, development and utilization of conventional assessment tools to
improve the teaching- learning process. It emphasizes the use of assessment of, as, and for, in
measuring knowledge, comprehension and other thinking skills in the cognitive, psychomotor or
affective domains. It allows the students to go through the standard steps in test construction and
development and application in grading systems

Course Outcomes:
1. Discuss the characteristics of the different concepts in assessment of learning
2. Distinguish measurement, evaluation, and assessment given varied classroom setting
3. Characterize assessments that will influence instructional decisions
4. Cite differences between standardized testing and classroom assessments
5. Recall the taxonomy of educational objectives by Bloom
6. Distinguish desirable qualities of good test instruments as bases for judging the quality of
classroom assessment
7. Plan a design of a table of specifications for the (major) subject to be tested
8. Describe the differences between a norm- referenced and criterion- referenced interpretation of
assessment performance in terms of how the scores are reported
9. Explain the meaning and function of the measures of central tendency and variability
10. Explain the meaning of normal and skewed score distribution

Course Outline:
MODULE 1 – BASIC CONCEPTS IN ASSESSMENT OF LEARNING
MODULE 2 - PRINCIPLES OF HIGH-QUALITY CLASSROOM ASSESSMENT
MODULE 3 – DEVELOPMENT OF CLASSROOM TOOLS FOR MEASURING KNOWLEDGE AND
UNDERSTANDING
MODULE 4: DESCRIPTION OF ASSESSMENT DATA
MODULE 5: INTERPRETATION AND UTILIZATION OF TEST RESULTS

1
lOMoARcPSD|11364589

Prelim
Topics
MODULE 1 – BASIC CONCEPTS IN ASSESSMENT OF LEARNING
MODULE 2 - PRINCIPLES OF HIGH-QUALITY CLASSROOM ASSESSMENT
Pre-Assessment Learning Activities:
Answer the following. Write your answer in a clean sheet of paper
1. What is the concept of Assessment?
2. What is the purpose of assessment?
3. What is assessment in education?

Lesson Discussion:

ASSESSMENT OF LEARNING 1

Assessment –refers to the process of gathering, describing or quantifying information about the student
performance. It includes paper and pencil test, extended responses (example essays) and performance
assessment are usually referred to as‖authentic assessment‖ task (example presentation of research work)

Measurement-is a process of obtaining a numerical description of the degree to which an individual


possesses a particular characteristic. Measurements answers the questions‖how much?

Evaluation- it refers to the process of examining the performance of student. It also determines whether
or not the student has met the lesson instructional objectives.

Test –is an instrument or systematic procedures designed to measure the quality, ability, skill or
knowledge of students by giving a set of question in a uniform manner. Since test is a form of assessment,
tests also answer the question‖how does individual student perform?

Testing-is a method used to measure the level of achievement or performance of the learners. It also
refers to the administration, scoring and interpretation of an instrument (procedure) designed to elicit
information about performance in a simple of a particular area of behavior.

Types of Measurement

There are two ways of interpreting the student performance in relation to classroom instruction. These are
the Norm-reference tests and Criterion-referenced tests.

Norm-reference test is a test designed to measure the performance of a student compared with other
students. Each individual is compared with other examinees and assigned a score-usually expressed as
percentile, a grade equivalent score or a stanine. The achievement of student is reported for broad skill
areas, although some norm referenced tests do report student achievement for individual.

The purpose is to rank each student with respect to the achievement of others in broad areas of knowledge
and to discriminate high and low achievers.
2
lOMoARcPSD|11364589

Criterion- referenced test is a test designed to measure the performance of students with respect to some
particular criterion or standard. Each individual is compared with a pre determined set of standard for
acceptable achievement. The performance of the other examinees are irrelevant. A student’s score is
usually expressed as a percentage and student achievement is reported for individual skills,

The purpose is to determine whether each student has achieved specific skills or concepts. And to find out
how mush students know before instruction begins and after it has finished.

Other terms less often used for criterion-referenced are objective referenced, domain referenced, content
referenced and universe referenced.

3
lOMoARcPSD|11364589

According to Robert L. Linn and Norma E. gronlund (1995) pointed out the common characteristics and
differences of Norm-Referenced Tests and Criterion-Referenced Tests

Common Characteristics of Norm-Referenced Test and Criterion-Referenced Tests

1. Both require specification of the achievement domain to be measured


2. Both require a relevant and representative sample of test items
3. Both use the same types of test items
4. Both used the same rules for item writing (except for item difficulty)
5. Both are judge with the same qualities of goodness (validity and reliability)
6. Both are useful in educational assessment

Differences between Norm-Referenced Tests and Criterion Referenced Tests

Norm –Referenced Tests Criterion-Referenced Tests

1. Typically covers a large domain of 1.Typically focuses on a delimited domain of


learning tasks, with just few items learning tasks, with a relative large number of
measuring each specific items measuring each specific task.
task.
2. Emphasizes discrimination among 2.Emphasizes among individuals can and
individuals in terms of relative of level of cannot perform.
learning.
3. Favors items of large difficulty and 3.Matches item difficulty to learning tasks,
typically omits very easy and very hard without altering item difficulty or omitting
items easy or hard times

4. Interpretation requires clearly defined4.Interpretation requires a clearly defined and


group delimited achievement domain

TYPES OF ASSESSMENT

There are four type of assessment in terms of their functional role in relation to classroom instruction.
These are the placement assessment, diagnostic assessment, formative assessment and summative
assessment.

A. Placement Assessment is concerned with the entry performance of student, the purpose of
placement evaluation is to determine the prerequisite skills, degree of mastery of the course
objectives and the best mode of learning.
B. Diagnostic Assessment is a type of assessment given before instruction. It aims to identify the
strengths and weaknesses of the students regarding the topics to be discussed. The purpose of
diagnostic assessment:

Downloaded by Nasmer Bembi (nasmerbembi@gmail.com)


lOMoARcPSD|11364589

1. To determine the level of competence of the students


2. To identify the students who have already knowledge about the lesson;
3. To determine the causes of learning problems and formulate a plane for remedial action.
C. Formative Assessment is a type of assessment used to monitor the learning progress of the
students during or after instruction. Purpose of formative assessment:
1. To provide feed back immediately to both student and teacher regarding the success and
failure of learning.
2. To identify the learning errors that is need of correction
3. To provide information to the teacher for modifying instruction and used for improving
learning and instruction
D. Summative Assessment is a type of assessment usually given at the end of a course or unit.
Purpose of summative assessment:
1. To determine the extent to which the instructional objectives have been met;
2. To certify student mastery of the intended outcome and used for assigning grades;
3. To provide information for judging appropriateness of the instructional objectives
4. To determine the effectiveness of instruction

MODULE 2 - PRINCIPLES OF HIGH-QUALITY CLASSROOM ASSESSMENT


MODULE 2 - PRINCIPLES OF HIGH-QUALITY CLASSROOM ASSESSMENT

1. Clarity of learning targets


2. Appropriateness of Assessment Methods
3. Validity
4. Reliability
5. Fairness
6. Positive Consequences
7. Practicality and Efficiency
8. Ethics

1. CLARITY OF LEARNING TARGETS

Assessment can be made precise, accurate and dependable only if what are to be achieved are
clearly stated and feasible. The learning targets, involving knowledge, reasoning, skills, products
and effects, need to be stated in behavioral terms which denote something which can be observed
through the behavior of the students.

Cognitive Targets

Benjamin Bloom (1954) proposed a hierarchy of educational objectives at the cognitive level.
These are:

Knowledge – acquisition of facts, concepts and theories

Comprehension - understanding, involves cognition or awareness of the interrelationships

Downloaded by Nasmer Bembi (nasmerbembi@gmail.com)


lOMoARcPSD|11364589

Application – transfer of knowledge from one field of study to another of from one concept to
another concept in the same discipline

Analysis – breaking down of a concept or idea into its components and explaining g the concept
as a composition of these concepts

Synthesis – opposite of analysis, entails putting together the components in order to summarize
the concept

Evaluation and Reasoning – valuing and judgment or putting the ―worth‖ of a concept or
principle.

Skills, Competencies and Abilities Targets

Skills – specific activities or tasks that a student can proficiently do


Competencies – cluster of skills
Abilities – made up of relate competencies categorized as:

 Cognitive
 Affective
 Psychomotor

Products, Outputs and Project Targets

 tangible and concrete evidence of a student’s ability


 need to clearly specify the level of workmanship of projects
 expert
 skilled
 novice

2. APPROPRIATENESS OF ASSESSMENT METHODS

Written-Response Instruments
Objective tests – appropriate for assessing the various levels of hierarchy of educational
objectives

Essays – can test the students’ grasp of the higher level cognitive skills

Checklists – list of several characteristics or activities presented to the subjects of a study, where
they will analyze and place a mark opposite to the characteristics.

Product Rating Scales

 Used to rate products like book reports, maps, charts, diagrams, notebooks, creative
endeavors
 Need to be developed to assess various products over the years

Downloaded by Nasmer Bembi (nasmerbembi@gmail.com)


lOMoARcPSD|11364589

Performance Tests - Performance checklist

 Consists of a list of behaviors that make up a certain type of performance


 Used to determine whether or not an individual behaves in a certain way when asked to
complete a particular task

Oral Questioning – appropriate assessment method when the objectives are to:

 Assess the students’ stock knowledge and/or


 Determine the students’ ability to communicate ideas in coherent verbal sentences.

Observation and Self Reports


 Useful supplementary methods when used in conjunction with oral questioning and
performance tests

3. VALIDITY

 Something valid is something fair.


 A valid test is one that measures what it is supposed to measure.

Types of Validity

 Face: What do students think of the test?


 Construct: Am I testing in the way I taught?
 Content: Am I testing what I taught?
 Criterion-related: How does this compare with the existing valid test?
 Tests can be made more valid by making them more subjective (open items).

MORE ON VALIDITY

Validity – appropriateness, correctness, meaningfulness and usefulness of the specific


conclusions that a teacher reaches regarding the teaching-learning situation.

Content validity – content and format of the instrument


 Students’ adequate experience
 Coverage of sufficient material
 Reflect the degree of emphasis

Face validity – outward appearance of the test, the lowest form of test validity

Criterion-related validity – the test is judge against a specific criterion

Construct validity – the test is loaded on a ―construct‖ or factor


lOMoARcPSD|11364589

4.RELIABILITY

 Something reliable is something that works well and that you can trust.
 A reliable test is a consistent measure of what it is supposed to measure.

Questions:
 Can we trust the results of the test?
 Would we get the same results if the tests were taken again and scored by a different
person?

Tests can be made more reliable by making them more objective (controlled items).

 Reliability is the extent to which an experiment, test, or any measuring procedure yields
the same result on repeated trials.

 Equivalency reliability is the extent to which two items measure identical concepts at an
identical level of difficulty. Equivalency reliability is determined by relating two sets of
test scores to one another to highlight the degree of relationship or association.

 Stability reliability (sometimes called test, re-test reliability) is the agreement of


measuring instruments over time. To determine stability, a measure or test is repeated on
the same subjects at a future date.

 Internal consistency is the extent to which tests or procedures assess the same
characteristic, skill or quality. It is a measure of the precision between the observers or of
the measuring instruments used in a study.

 Interrater reliability is the extent to which two or more individuals (coders or raters)
agree. Interrater reliability addresses the consistency of the implementation of a rating
system.

RELIABILITY – CONSISTENCY, DEPENDABILITY, STABILITY WHICH CAN BE


ESTIMATED BY

Split-half method
 Calculated using the following: Spearman-Brown prophecy formula and Kuder-
Richardson – KR 20 and KR21

 Consistency of test results when the same test is administered at two different time
periods such as Test-retest method and Correlating the two test results.

5. FAIRNESS

The concept that assessment should be 'fair' covers a number of aspects.


 Student Knowledge and learning targets of assessment
lOMoARcPSD|11364589

 Opportunity to learn
 Prerequisite knowledge and skills
 Avoiding teacher stereotype
 Avoiding bias in assessment tasks and procedures

6. POSITIVE CONSEQUENCES

Learning assessments provide students with effective feedback and potentially improve their
motivation and/or self-esteem. Moreover, assessments of learning gives students the tools to
assess themselves and understand how to improve positive consequence on students, teachers,
parents, and other stakeholders

7. PRACTICALITY AND EFFICIENCY

 Something practical is something effective in real situations.


 A practical test is one which can be practically administered.

Questions:

 Will the test take longer to design than apply?


 Will the test be easy to mark?

Tests can be made more practical by making it more objective (more controlled items)

 Teacher Familiarity with the Method


 Time required
 Complexity of Administration
 Ease of scoring
 Ease of Interpretation
 Cost Teachers should be familiar with the test, - does not require too much time -
implementable

8. ETHICS

 Informed consent
 Anonymity and Confidentiality

1. Gathering data
2. Recording Data
3. Reporting Data

ETHICS IN ASSESSMENT – ―RIGHT AND WRONG‖


 Conforming to the standards of conduct of a given profession or group
 Ethical issues that may be raised
lOMoARcPSD|11364589

1. Possible harm to the participants.


2. Confidentiality.
3. Presence of concealment or deception.
4. Temptation to assist students.

Learning References:

[1] Navaro, R.L., Santos, R.G., and Corpuz, B.B. (2019). Assessment of Learning,
OBE & PPST Based, Fourth Edition. Lorimar Publishing, Inc.
[2] Bartlett, J. (2015). Outstanding Assessment for Learning in the Classroom. Routledge, Taylor &
Francis Group
[3] Frey, N. and Fisher, D. (2011). The Formative Assessment Action Plan. USA: ASCD
[4] Lewin, and Shoemaker, B. (2011) Great Performances: Creating Classroom-Based Assessment
Task 2nd Edition. USA ASCD
[5] Ecclestone, K. et. al. (2010) Transforming Formative Assessment in Lifelong Learning. UK:
McGraw-Hill Open University Press
[6] Airasian, Peter W. (2005). Classroom Assessment Concept and Applications. New York:
McGraw-Hill Companies, Inc.
[7] Shermis, Mark D. and Di Vesta, Francis J. (2011) Classroom assessment in action. Rowman &
Littlefield Publishers, Inc.
[8] Anderson, Lorin W. (2003). Classroom Assessment: Enhancing the Quality of
Teaching Decision Making. Lawrence Erlbaum Associates, Publishers

Post-Assessment Learning Activities: Answer the following. Write your answer in a clean sheet of paper
1. Summarize the coverage topic in Prelim
2. In your own understanding, what is assessment all about?
3. What is the difference between norm-reference and criterion reference?
4. Differentiate validity and reliability?
5. Give at least 2 common characteristics of norm-reference and criterion reference, then discuss
lOMoARcPSD|11364589

Midterm
Topics: MODULE 3 – DEVELOPMENT OF CLASSROOM TOOLS FOR MEASURING KNOWLEDGE AND
UNDERSTANDING

Pre-Assessment Learning Activities: Answer the following. Write your answer in a clean sheet of paper
1. What are classroom tools? Discuss your answer

Lesson Discussion:

DIFFERENT TYPES OF TESTS


MAIN POINTS FOR TYPES OF TEST
COMPARSON

Psychological other students’

 Aims to measure
students intelligence
or mental ability in a
large degree without
reference to what
the students has
learned
 Measures the
Purpose
intangible
characteristics of an
individual (e.g.
Aptitude Tests,
Personality Tests,
Intelligence Tests)
Survey

 Covers a broad range


of objectives
 Measures general
Scope of Content achievement in
certain subjects
 Constructed by
trained professional
Norm- Referenced

 Result is interpreted
by comparing one
student’s
performance with
lOMoARcPSD|11364589

Educational tive
 Measures
 Aim fundamental
s to skills and
mea abilities
sure  Typically
the constructed by
resul the teacher
t of Criterion-Referenced
instr
uctio  Result is
ns interpreted by
and comparing
learn student’s
ing performance
(e.g. based on a
Perf predefined
orm
ance
Test
s)

Mastery

 C
o
v
e
r
s
a
s
p
e
c
if
i
c
o
b
j
e
c
lOMoARcPSD|11364589

performance standard
 Some will really pass  All or none may pass
 There is competition  There is no
for a limited competition for a
percentage of high limited percentage of
scores high score
 Describes pupil’s  Describes pupil’s
performance mastery of course
compared to others objectives
Interpretation

 Verbal  Non-verbal
 Words are used by  Students do not use
students in attaching words in attaching
meaning to or meaning to or in
responding to test responding to test
items items (e.g. graphs,
Language numbers, 3-D
Mode subjects)
Standardized Informal

 Constructed by a  Constructed by a
professional item classroom teacher
writer
 Covers a broad range  Covers a narrow
of content covered in range of content
a subject area
 Uses mainly multiple  Various types of
choice items are used
 Items written are  Teacher picks or
screened and the best writes items as
Construction items were chosen for needed for the test
the final instrument
 Can be scored by a  Scored manually by
machine the teacher
 Interpretation of  Interpretation is
results is usually usually criterion-
norm-referenced referenced
Individual Group

 Mostly given orally  This is a paper-and-


or requires actual pen test
demonstration of skill
 One-on-one  Loss of rapport,
situations, thus, many insight and
opportunities for knowledge about each
lOMoARcPSD|11364589

clinical observation examinee


 Chance to follow-up  Same amount of time
examinee’s needed to gather
response in order to information from one
Manner of Administration clarify or student
comprehend it more
clearly Subjective
Objective
 Affected by scorer’s
 Scorer’s personal personal opinions,
judgement does not biases and judgement
affect the scoring  Several answers are
 Worded that only possible
one answer is  Possible to
acceptable disagreement on what
Effect of Biases  Little or no is the correct answer
disagreement on what
is the correct answer

Power Speed

 Consists of series of  Consists of items


items arranged in approximately equal
ascending order of in difficulty
difficulty
 Measures student’s  Measure’s student’s
ability to answer speed or rate and
Time Limit and Level of more and more accuracy in
Difficulty difficult items responding
Selective Supply

 There are choices for  There are no choices


the answer for the answer
 Multiple choice, True  Short answer,
or False, Matching Completion,
Type Restricted or
Extended Essay
 Can be  May require a longer
answered time to answer
quickly  Less chance to
Format  Prone to guessing guessing but prone to
bluffing
 Time consuming to
 Time consuming to answer and score
construct
lOMoARcPSD|11364589

TYPES OF TESTS ACCORDING TO FORMAT

1. Selective Type – provides choices for the answer


a. Multiple Choice – consists of a stem which describes the problem
and 3 or more alternatives which give the suggested solutions. The incorrect
alternatives are the distractions.
b. True-False or Alternative Response – consists of declarative statement
that one has to mark true or false, right or wrong, correct or incorrect, yes or no, fact or
opinion,, and the like
c. Matching Type – consists of two parallel columns: Column A, the
column of premises from which a match is sought; Column B, the column of responses
from which the selection is made.
2. Supply Test
a. Short Answer – uses a direct question that can be answered by a word,
phrase, a number, or a symbol
b. Completion Test – It consists of an incomplete statement
3. Essay Test
a. Restricted Response – limits the content of the response by
restricting the scope of the topic
b. Extended Response – allows the students to select any factual
information that they think is pertinent, to organize their answers in accordance with their best
judgement
Projective Test
 A psychological test that uses images in order to evoke responses from a subject and
reveal hidden aspects of the subject’s mental life

 These were developed in an attempt to eliminate some of the major problems


inherent in the use of self-report measures, such as the tendency of some
respondents to give “socially responsible” responses.

Important Projective Techniques


1. Word Association Test. An individual is given a clue or hint and asked to respond
to the first thing that comes to mind.
2. Completion Test. In this the respondents are asked to complete an incomplete
sentence or story. The completion will reflect their attitude and state of mind.
3. Construction Techniques (Thematic Apperception Test) – This is more or less
like completion test. They can give you a picture and you are asked to write a story about it. The
lOMoARcPSD|11364589

initial structure is limited and not detailed like the completion test. For e.g.: 2 cartoons are
given and a dialogue is to be written.
4. Expression Techniques - : In this the people are asked to express the feeling or
attitude of each other people.

GUIDELINES FOR CIINSTRUCTING TEST ITEMS


When to use Essay Test
Essays are appropriate when:
1. the group to be tested is SMALL and the test is NOT TO BE USED again;
2. you wish to encourage and reward the development of student’s SKILL
WRITING;
3. you are more interested in exploring the student’s ATTITDES than in
measuring his/her academic achievement;
4. you are more confident of your ability as a critical and fair reader than as an
imaginative writer of good objective test items

When to Use Objective Test Items

Objective test items are especially appropriate when:


1. The group to be tested is LARGE and the test may be REUSED;
2. HIGHLY RELIABLE TEST SCOREs must be obtained as efficiently as possible;
3. IMPARTIALITY of evaluation, ABSOLUTE FAIRNESS, and FREEDOM from
possible test SCORING INFLUENCES – fatigue, lack of anonymity are essential;
4. You are more confident of your ability to express objective test items clearly
than your ability to judge essay test answers correctly;
5. There is more PRESSURE IN SPEEDY REPORTING OF SCORES than for speedy
test preparation.

Multiple Choice Items


 It consists of:
1. Stem – which identifies the question or problem
2. Response alternatives or Options
3. Correct answer
Example:
Which of the following is a chemical change? (STEM)
a. Evaporation of alcohol c. burning oil
b. Freezing of water d. melting of wax
Alternatives
lOMoARcPSD|11364589

Advantage of Using Multiple Choice Items


Multiple choice items can provide:
1. Versatility in measuring all levels of cognitive ability
2. Highly reliable test scores
3. Scoring efficiency and accuracy
4. Objective measurement of student achievement or ability
5. A wide sampling of content or objectives
6. A reduce guessing factor when compared to true-false items
7. Different response alternatives which can provide diagnostic feedback.

Limitations of Multiple Choice Items


1. Difficult and time consuming to construct
2. Lead a teacher to favour simple recall of facts
3. Place a high degree of dependence on student’s reading ability and teacher’s
writing ability
SUGGESTIONS FOR WRITING MULTIPPLE CHOICE ITEMS
1. When possible, state the stem as a direct question rather than as an incomplete
statement.
Poor: Alloys are ordinarily produced by…
Better: How are alloys ordinarily produced?
2. Present a definite, explicit singular question or problem in the stem.
Poor: Psychology…
Better: The science of mind and behaviour is called…
3. Eliminate excessive verbiage or irrelevant information from the stem.
Poor: While ironing her formal polo shirt, June burned her hand accidentally on
the hot iron. This was due to a heat transfer because…
Better: Which of the following ways of heat transfer explains why June’s hand
was burned after she touched a hot iron?
4. Include in the stem any word(s) that might otherwise be repeated in each
alternative.
Poor:
In national elections in the US, the President is officially
a. Chosen by the people
b. Chosen by the electoral college
c. Chosen by members of the Congress
d. Chosen by the House of Representative’
Better:
lOMoARcPSD|11364589

In national elections in the US, the President is officially chosen by


a. the people
b. the electoral college
c. members of the Congress
d. the House of Representative
5. Use negatively stated questions sparingly. When used, underline and/or
capitalize the negative word.
Poor: Which of the following is not cited as an accomplishment of Arroyo
administration?
Better: Which of the following is NOT cited as an accomplishment of Arroyo
administration?
6. Make all alternatives plausible and attractive to the less knowledge or skilful
student.
What process is most nearly opposite of photosynthesis?
Poor Better
a. Digestion a. Digestion
b. Relaxation b. Assimilation
c. Respiration c. Respiration
d. Exertion d. Catabolism
7. Make the alternative grammatically parallel with each other and consistent with
the stem.
Poor: What would advance the application of atomic discoveries to medicine?
a. Standardized techniques for treatment of patients
b. Train the average doctor to apply the radioactive treatments
c. Remove the restriction of the use of radioactive substances
d. Establishing hospital staffed by highly trained radioactive therapy
specialist.
Better: What would advance the application of atomic discoveries to medicine?
a. Development of standardized techniques for treatment of patients
b. Removal of restriction on the use of radioactive substances
c. Addition of trained radioactive therapy specialist to hospital staffs
d. Training the average doctor in applicant of radioactive treatments.
8. Make the alternatives mutually exclusive.
Poor: The daily minimum required amount of milk that a 10-year old should
drink is
a. 1-2 glasses
b. 2-3 glasses*
c. 3-4 glasses*
lOMoARcPSD|11364589

d. At least 4 glasses
Better: What is the daily minimum required amount of milk a 10-year old child
should drink?
a. 1 glass
b. 2 glasses
c. 3 glasses
d. 4 glasses
9. When possible, present alternatives in some logical order (chronological, most to
least, alphabetical).
At 7 a.m. two trucks leave a diner and travel north. One truck averages 42 miles
per hour and the other truck averages 38 miles per hour. At what time will be 24 miles apart?
Undesirable Desirable
a. 6 p.m. a. 1 a.m.
b. 9 a.m. b. 6 a.m.
c. 1 a.m. c. 9 a.m.
d. 1 p.m. d. 1 p.m.
e. 6 a.m. e. 6 p.m.
10. Be sure there is only one correct or best response to the item.
Poor: The two most desired characteristics in a classroom test are validity and
a. Precision
b .Reliability*
c. Objectivity
d. Consistency*
Best: The two most desired characteristics in a classroom test are validity and
a. Precision
b. Reliability*
c. Objectivity
d. Standardization
11. Make alternative approximately equal in length.
Poor: The most general cause of low individual incomes in the US is
a. Lack of valuable productive services to sell*
b. Unwillingness to work
c. Automation
d. Inflation
Better: What is the most general cause of low individual incomes in the US?
a. A lack of valuable productive services to sell*
b. The population’s overall unwillingness to work
c. The nation’s increase reliance on automation
lOMoARcPSD|11364589

d. An increasing national level of inflation.


12. Avoid irrelevant clues, such as grammatical structure, well-known verbal
associations or connections between stem and answer.
Poor: (grammatical clue) A chain of islands is called an
a. Archipelago
b. Peninsula
c. Continent
d. Isthmus
Poor: (verbal association) The reliability of a test can be estimated by a
coefficient of
a. Measurement
b. Correlation*
c. Testing
d. Error
Poor: (connections between stem and answer) The height to which a water
dam is build depends on
a. The length of the reserve behind the dam.
b. The volume of the water behind the dam.
c. The height of water behind the dam.*
d. The strength of the reinforcing the wall.
13. Use at least four alternatives for each item to lower the probability of getting
the item correctly by guessing.
14. Randomly distribute the correct responses among the alternative positions
throughout the test having approximately the same proportion of the alternatives a, b, c, d and
e as correct response.
15. Use the alternative NONE OF THE ABOVE and ALL OF THE ABOVE sparingly.
When used, such alternatives should occasionally be used as the correct response.
True-False Test Items
True-false test items are typically used to measure the ability to identify whether or not the
statements of facts are correct. The basic format is simply a declarative statement that the
student must judge as true or false. No modification of the basic form in which the student
must respond “yes” or “no”, “agree” or “disagree.”
Three Forms:
1. Simple – consists of only two choices
2. Complex – consists of more than two choices
3. Compound – two choices plus a conditional completion response
Examples:
Simple: The acquisition of morality is a developmental process. True False
lOMoARcPSD|11364589

Complex: The acquisition of morality is a developmental process. True False Opinion


Compound: An acquisition of morality is a developmental process. True False
If the statement is false, what makes it false?

Advantages of True-False Items


True-false items can provide:
1. The widest sampling of content or objectives per unit of testing time
2. Scoring efficiency and accuracy
3. Versatility in measuring all levels of cognitive ability
4. Highly reliable test scores; and
5. An objective measurement of student achievement or ability.
Limitations of True-False Items
1. Incorporate an extremely high guessing factor
2. It can often lead the teacher to write ambiguous statements
due to the difficulty of writing statements which are unequivocally true or false.
3. Do not discriminate between students varying ability as well as other item types.
4. It can often include more irrelevant clues than do other item types.
5. It can often lead a teacher to favour testing of trivial challenge.

Suggestions for Writing True-False Items (Payne, 1984)


1. Base true-false items upon statements that are absolutely true or false, without
qualifications or exceptions.
Poor: Nearsightedness is hereditary in origin.
Better: Geneticists and eye specialists believe that the predisposition to
nearsightedness is hereditary.
2. Express the item statement as simply as clearly as possible.
Poor: When you see a highway with a marker that reads: “Interstate 80,” you
know that the construction and upkeep of that road is built and maintained by the local and
national government.
Better: The construction and maintenance of the interstate highways are are
provided by both local and national government.
3. Express a single idea in each test item.
Poor: Water will boil at a higher temperature if the atmospheric pressure on its
surface is increased and more heat is applied to the container.
Better: Water will boil at a higher temperature if the atmospheric pressure on its
surface is increased; or water will boil at a higher temperature if more heat is applied to the
container.
lOMoARcPSD|11364589

4. Include enough background information and qualifications so that the ability to


respond correctly to the item does not depend on some special, uncommon knowledge.
Poor: The second principle of education is that the individual gathers
knowledge.
Better: According to John Dewey, the second principle of education is that the
individual gathers knowledge.
5. Avoid lifting statements directly from the text lecture or other materials so that
memory alone will not permit a correct answer.
Poor: For every actions there is an opposite or equal reaction.
Better: If you were to stand in a canoe and throw a life jacket forward to another
canoe, chances are, you canoe will jerk backward.
6. Avoid using negatively stated item statements.
Poor: The Supreme Court is not composed of nine justices.
Better: The Supreme Court is composed of nine justices
7. Avoid the use of unfamiliar vocabulary.
Poor: According to some politicians, the raison d’etre for capital punishment is
retribution.
Better: According to some politicians, justification for the existence of capital
punishment is retribution.
8. Avoid the use of specific determiners which should permit a test wise but
unprepared examinee to respond correctly. Specific determiners refer to sweeping terms like
always, all, none, never, impossible, inevitable. Statements including such terms are likely to be
false. On the other hand, statements using qualifying determiners such as usually, sometimes,
often, are likely to be true. When statements require specific determiners, make sure they
appear in both true and false items.
Poor: All sessions of Congress are called by the President (F)
The Supreme Court is frequently required to rule on the constitutionality
of the law.
(T) The objectives test is generally easier to score than an essay test. (T)
Better: When specific determiners are used, reverse the expected outcomes.
The sum of angles of a triangle is always 180 degrees. (T)
Each molecule of a given compound is chemically the same as every
other molecule of that compound. (T)
The galvanometer is the instrument usually used for the metering of
electrical energy use in a home. (F)
9. False items tend to discriminate more highly than true items. Therefore, use
more false items than true items (but not more than 15% additional false items).
Matching Test Items
lOMoARcPSD|11364589

In general matching items consists of a column of stimuli presented on the left side
of the exam page and a column of responses placed on the right side of the page. Students are
required to match the response associated with a given stimulus.
Advantages of Using Matching Test Items
1. Require short period of reading and response time allowing the teacher to cover
more content.
2. Provide objective measurement of student achievement or ability.
3. Provide highly reliable test scores.
4. Provide scoring efficiency and accuracy.
Disadvantages of Using Matching Test Items
1. Have difficulty measuring learning objectives requiring more than simple recall
or information.
2. Are difficult to construct due to the problem of selecting a common set of stimuli
and responses.
Suggestions for Writing Matching Test items
1. Include directions which clearly state the basis for matching the stimuli with the
responses. Explain whether or not the response can be used more than once and indicate
where to write the answer.
Poor: Directions: Match the following.
Better: Directions: On the line to the left of each identifying location and
characteristics in Column 1, write the letter on the country in column III that is best defined.
Each country in Column may be used more than once.
2. Use only homogeneous material in matching items.
Poor: Directions: Match the following.
1. Water A. NaCI
2. Discovered Radium B. Fermi
3. Salt C. NH3
4. Year of the First Nuclear Fission by man D. 1942
5. Ammonia E. Curie
Better: Directions: On the line to the left of each compound in column I, write
the letter of the compound’s formula presented in column II. Use each formula once.
Column I Column II
1. Water A.H2SO4
2. Salt B. HCI
3. Ammonia C. NaCI
4. Sulfuric Acid D. H2O
E. H2HCI
lOMoARcPSD|11364589

3. Arrange the list of responses in some systematic order if possible – chronological,


alphabetical.
Directions: On the line to the left of each definition in column I, write the letter of the
defense mechanism in column II that is described. Use each defense mechanism only once.
Column I Column II
Undesirable Desirable
1. Hunting for reason to support A. Rationalization A. Denial of Reality
one’s belief
2. Accepting the values B. Identification B. Identification
and norms of others
3. As one’s own even if C. Projection C. Projection
they are contrary to previously held values
4. Attributing to other’s D. Introjection D. Projection
own unacceptable
impulse and thoughts and desires
5. Ignoring disagreeable E. Denial of Reality E. Rationalization
And situations, thoughts and
desires

4. Avoid grammatical or other clues to correct response.


Poor: Directions: Match the following in order to complete the sentence on the left.
1. Igneous rocks are formed A. a hardness of 7
2. The formation of coal requires B. with crystalline rock
3. Ageode is filled C. a metamorphic rock
4. Feldspar is classified as D. through the solid formation of molten
Better: Avoid sentence completion due to grammatical clues.
Note:
1. Keep matching items brief, limiting the list of stimuli to under 10
2. Include more responses than stimuli to help prevent answering through the
process of elimination.
3. When possible, reduce the amount of reading time by including only short
phrases or single word in the response list.

Completion Test Items


The completion items require the student to answer a question or to finish an
incomplete statement by filling in a blank with correct word or phrase.
Example:
lOMoARcPSD|11364589

According to Freud, personality is made up of three major systems, the ,


the , and the .
Advantages of Using Completion Items
Completion items can:
1. Provide a wide sampling of content;
2. Efficiency measure lower levels of cognitive ability;
3. Minimize guessing as compared to multiple choice or true-false items; and
4. Usually provide an objective measure of student achievement or ability
Limitations of Using Completion Items
Completion items:
1. Are difficult to construct so that the desired response is clearly indicated;
2. Have difficulty in measuring learning objectives requiring more than simple recall
of information;
3. Can often include irrelevant clues than do other item types;
4. Are more time consuming to score when compared to multiple choice or true-
false items; and
5. Are more difficult to score since more than one answer may have to be
considered correct if the item was not properly prepared.
Suggestions for Writing Completion Test Items
1. Omit only significant words from the statement.
Poor: Every atom has a central (core) called nucleus.
Better: Every atom has a central core called a (an) (nucleus)
2. Do not omit so many words from the statement that the intended meaning is
lost.
Poor: The were to Egypt was the were to Persia as were to the clearly
tribes of Israel.
Better: The Pharaohs were to Egypt as the were to Persia as were to the
early tribes of Israel.
3. Avoid grammatical or other clues to the correct response.
Poor: Most if the United States’ libraries are organized according to the
(Dewey) decimal system.
Better: Which organizational system is used by most of the United States’
libraries? (Dewey Decimal)
4. Be sure there is only one correct response.
Poor: Trees which shed their leaves annually are (seed-bearing, common).
Better: Trees which shed their leaves annually are called (delicious).
5. Make the blanks of equal length.
Poor: In Greek mythology, Vulcan was the sun of (Jupiter and Juno).
lOMoARcPSD|11364589

Better: In Greek mythology, Vulcan was the son of and .


6. When possible, delete words at the end of the statement after the student has
been presented a clearly defined problem.
Poor: (122.5) is the molecular weight of KC103.
Better: The molecular weight of KC103 is .
7. Avoid lifting statements directly from the text, lecture or other sources.
8. Limit the required response to a single word or phrase.

Essay Test Items


A classroom essay test consists of a small number of questions to which the
student is expected to demonstrate his/her ability to:
a. Recall factual knowledge;
b. Organize this knowledge; and
c. Present the knowledge is a logical, integrated answer to the question.
Classification of Essay Test:
1. Extended-response essay item
2. Limited Response or Short-answer essay item
Example of Extended-Response Essay Item:
Explain the difference between the S-R (Stimulus-Response) and the S-O-R (Stimulus-
Organism-Response) theories of personality. Include in your answer the following:
a. Brief description of both theories
b. Supporters of both theories
c. Research methods used to study each of the two theories (20 pts)
Example of Short-Answer Essay Item:
Identify research methods used to study the (Stimulus-Response) and the S-O-R
(Stimulus-Organism-Response) theories of personality. (10pts)
Advantages of Using Essay Items
Essay items:
1. Are easier and less time consuming to construct than most item types;
2. Provide a means for testing students’ ability to compose an answer and present it
in a logical manner; and
3. Can efficiently measure higher order cognitive objectives – analysis, synthesis,
evaluation.

Limitations of Using Essay Items


Essay Items:
1. Cannot measure a large amount of content or objectives;
2. Generally prove a low test scorer reliability;
lOMoARcPSD|11364589

3. Require an extensive amount of instructor’s time to read and grade; and


4. Generally do not provide an objective measure of student achievement or ability
(subject to bias on the part of the grader)
Suggestions for Writing the Essay Test Items
1. Prepare essay items that elicit the type of behaviour you want to measure.
Learning Objective: The student will be able to explain how the normal curve
serves as a statistical model.
Poor: Describe a normal curve in terms of symmetry, modality, kurtosis and
skewness.
Better: Briefly explain how the normal curve serves as statistical model for
estimation and hypothetical testing.
2. Phrase each items so that the student’s task is clearly indicated.
Poor: Discuss the economic factors which led to stock market crash of 2008.
Better: Identify the three economic conditions which led to the stock market
crash of 2008. Discuss briefly each condition in correct chronological sequence and in one
paragraph indicate how the three factors were interrelated.
3. Indicate for each item appoint or weight and an estimated the limit for
answering.
Poor: Compare the writing of Bret Harte and Mark Twain in terms of setting,
depth of characterization, and dialogue styles of their main characters.
Better: Compare the writings of Bret and Mark Twain in terms of setting, depth
of characterization, and dialogue styles of their main characters. (10 points 20 points)
4. Ask questions that will elicit responses on which experts could agree that one
answer is better than another.
5. Avoid giving a student a choice among optional items as this greatly reduces the
reliability of the test.
6. It is generally recommended for classroom examinations to administer several
short-answer items rather than only on or two extended-response items.

Guidelines for Grading Essay Items


1. When writing each essay item, simultaneously develop a scoring rubric.
2. To maintain a consistent scoring system and ensure same criteria are applied
to all assessments, score one essay across all test prior to scoring the next essay.
3. To reduce the influence of the halo effect, bias and other subconscious factors,
all essay questions should be graded blind to the identity of the student.
4. Due to the subjective nature of graded essays, the score on one essay may be
influenced by the quality of previous essays. To provide this type of bias, reshuffle the order of
assessments after reading through each item.
lOMoARcPSD|11364589

Principle 3: Balanced
- A balanced assessments sets target in all sets in domains of learning (cognitive,
effective, and psychomotor) or domains of intelligence (verbal-linguistics, logic
mathematical, bodily kinaesthetic, visual-spatial, musical-rhythmic, intrapersonal-social,
intrapersonal-introspection, physical world-natural-existential-spiritual)
- A balanced assessment makes use of both traditional and alternative assessment.
Principle 4. Validity
Validity – is a degree to which the assessment instrument measures what it intends
to measure.
 It is also refers to the usefulness of the instrument for a given purpose.
 It is the most important criterion of a good assessment instrument
Ways in Establishing Validity
1. Face Validity- is done by examining the physical appearance of the
instrument.
2. Content Validity- is done through a careful and critical examination of the
objectives of assessment so that it reflects the curricular objectives.
3. Criterion-related Validity- is established statistically such that a set of scores
revealed by the measuring instrument IS CORRELATED with the scores obtained in another
EXTERNAL PREDICTOR OR MEASURE.
It has two purposes:
a. Concurrent Validity- describe the present status of the individual by correlating the sets of
scores obtained FROM TWO MEASUREs GIVEN CONCURRENTLY.
Example: Relate the reading test result with pupils’ average grades in reading given by the
teacher.
b. Predictive Validity- describes the future performance of an individual by
correlating the sets of scores obtained from TWO MEASURES GIVEN AT A LONGER TIME
INTERVAL.
Example: The entrance examination scores in a test administered to a freshmen class at the
beginning of the school year is correlated with the average grades at the end of the school year.
4. Construct Validity- Validity established by analysing the activities and processes
that correspond to a particular concept; is established statistically by comparing psychological
traits or factors that theoretically influence scores in a test.
a. Convergence validity helps to establish construct validity when you use two different
measurement procedures and research methods (e.g., participant observation and a survey) in
your study to collect data about a construct (e.g., anger, depression, motivation, task
performance).
lOMoARcPSD|11364589

b. Divergent validity helps to establish construct validity by demonstrating that the construct
you are interested in (e.g., anger) is different from other constructs that might be present in
your study (e.g., depression).

Factors Influencing the Validity of an Assessment Instrument


1. Under Directions- directions that do not clearly indicate to the students how to
respond to the task and how to record the responses tend to reduce validity.
2. Reading Vocabulary and sentence structure too difficult- Vocabulary and
sentences structure that are too complicated for the student result in the assessment of
reading comprehension thus altering the meaning of assessment result.
3. Ambiguity- Ambiguous statements in assessments task contribute to
misinterpretations and confusion. Ambiguity sometimes confuses the better students more
than it does the poor students.
4. Inadequate time limits- time limits that do not provide students with enough
time to consider the tasks and provide thoughtful responses can reduce the validity of
interpretations of results.
5. Overemphasis of easy- to assess aspects of domain at the expense of
important, but hard- to assess aspects (construct under the presentation). It is easy to develop
test question that asses factual recall and generally harder to develop ones that tap conceptual
understanding or higher-order thinking processes such as the evaluation of completing
positions or arguments. Hence it is important to guard against under representation of task
getting the important, but more difficult to assess aspects of achievement.
6. Test items inappropriate for the outcomes being measured- attempting to
measure understanding, thinking, skills and other complex types of achievement with test
forms that are appropriate for only measuring factual knowledge will invalidate the results.
7. Poorly constructed test items- test items that unintentionally provide clues to
the answer tend to measure the students’ alertness in detecting clues as well as mastery of
skills or knowledge the test is intended to measure
8. Test too short- if a test is too short to provide a representative sample of the
performance we are interested in its validity will suffer accordingly.
9. Improper arrangement of items- test items are typically arranged in order of
difficulty, with the easiest items first. Placing difficult items first in the test may cause students
to spend too much time on these and prevent them from reaching items they could easily
answer. Improper arrangement may also influenced validity by having a detrimental effect on
student motivation.
10. Identifiable pattern of answer- Placing correct answers in some systematic
pattern (e.g., T,T,F,F, or B,B,BC,C,C,D,D,D) enables students to guess the answers to some items
more easily, and this lowers validity.
lOMoARcPSD|11364589

TABLE OF SPECIFICATIONS – TOS

Table of specification is a device for describing test items in terms of the content and the process
dimensions. That is, what a student is expected to know and what he or she is expected to do with that
knowledge. It is described by combination of content and process in the table of specification.

Sample of One way table of specification in Linear Function

Content Number of Class Number of ItemsTest Item

Sessions Distribution

1. Definition of linear function 2 4 1-4

2. Slope of a line 2 4 5-8

3. Graph of linear function 2 4 9-12

4. Equation of linear function 2 4 13-16

5. Standard Forms of a line 3 6 17-22

6. Parallel and perpendicular lines4 8 23-30

7. Application of linear functions 5 10 31-40

TOTAL 20 40 40

Number of items= Number of class sessions x desired total number of itens

Total number of class sessions

Example :

Number of items for the topic‖ definition of linear function‖

Number of class session= 2

Desired number of items= 40

Total number of class sessions=20


lOMoARcPSD|11364589

Number of items= Number of class sessions x desired total number of itens


Total number of class sessions

=2x40
20

Number of items= 4

Sample of two-way table of specification in Linear Function

Content Class hoursKnow Com App AnalysisSynthesisEvaluatiTota


onl

1.Definition of linear 2 1 1 1 1 4
function
2.Slope of a line 2 1 1 1 1

3.Graph of linear function 2 1 1 1 1 4

4.Equation of linear function 2 1 1 1 1 4

5.Standard Forms of a line 3 1 1 1 1 1 1 6

6.Parallel and perpendicular 4 1 2 1 2 8


line
7.Application of linear 5 1 1 3 1 3 10
functions
TOTAL 20 4 6 8 8 7 7 40
lOMoARcPSD|11364589

Learning References:

[9] Navaro, R.L., Santos, R.G., and Corpuz, B.B. (2019). Assessment of Learning, OBE
& PPST Based, Fourth Edition. Lorimar Publishing, Inc.
[10]Bartlett, J. (2015). Outstanding Assessment for Learning in the Classroom. Routledge, Taylor &
Francis Group
[11]Frey, N. and Fisher, D. (2011). The Formative Assessment Action Plan. USA: ASCD
[12]Lewin, and Shoemaker, B. (2011) Great Performances: Creating Classroom-Based Assessment
Task 2nd Edition. USA ASCD
[13]Ecclestone, K. et. al. (2010) Transforming Formative Assessment in Lifelong Learning. UK:
McGraw-Hill Open University Press
[14]Airasian, Peter W. (2005). Classroom Assessment Concept and Applications. New York:
McGraw-Hill Companies, Inc.
[15] Shermis, Mark D. and Di Vesta, Francis J. (2011) Classroom assessment in action. Rowman &
Littlefield Publishers, Inc.
[16]Anderson, Lorin W. (2003). Classroom Assessment: Enhancing the Quality of
Teaching Decision Making. Lawrence Erlbaum Associates, Publishers

Post-Assessment Learning Activities:


1. Construct a Table of specification according to your major
2. Construct test item in terms of:
2.1- multiple choice test with TOS
2.2- true or false test
2.3 identification test
2.4- matching test
lOMoARcPSD|11364589

Final
Topics: MODULE 4: DESCRIPTION OF ASSESSMENT DATA
MODULE 5: INTERPRETATION AND UTILIZATION OF TEST RESULTS

Pre-Assessment Learning Activities:


1. What is assessment data?
2. How to interpret data using test results

Lesson Discussion:
MODULE 4: DESCRIPTION OF ASSESSMENT DATA
ITEM ANALYSIS

Item analysis refers to the process of examining the student’s responses to each item in the test.
According to Abubakar S. Asaad and William M. Hailaya (Measurement and Evaluation Concepts &
Principles) Rex Bookstore (2004 Edition), there are two characteristics of an item. These are desirable
and undesirable characteristics. An item that has desirable characteristics can be retained for subsequent
use and that with undesirable characteristics is either be revised or rejected.

These criteria in determining the desirability and undesirability of an item.


lOMoARcPSD|11364589

a. Difficulty if an item
b. Discriminating power of an item
c. Measures of attractiveness

Difficulty index refers to the proportion of the number of students in the upper and lower groups who
answered an item correctly. In a classroom achievement test, the desired indices of difficulty not lower
than 0.20 nor higher than 0.80. the average index difficulty form 0.30 or 0.40 to maximum of 0.60.

DF = PUG + PLG
2

PUG = proportion of the upper group who got an item right


PLG = proportion of the lower group who get an item right

Level of Difficulty of an Item

Index Range Difficulty Level

0.00-0.20 Very difficult

0.21-0.40 Difficult

0.41-0.60 Moderately Difficult

0.61-0.80 Easy

0.81-1.00 Very Easy

Index of Discrimination

Discrimination Index is the differences between the proportion of high performing students who got the
item and the proportion of low performing students who got an item right. The high and low performing
students usually defined as the upper 27% of the students based on the total examination score and the
lower 27% of the students based on total examination score. Discrimination are classified into positive
Discrimination if the proportion of students who got an item right in the upper performing group is
greater than the students in the upper performing group. And Zero Discrimination if the proportion of the
students who got an item right in the upper performing group and low performing group are equal.

Discrimination Index Item Evaluation

0.40 and up Very good item


lOMoARcPSD|11364589

0.30-0.39 Reasonably good item but possibly subject to


improvement
0.20-0.29 Marginal, usually needing and being subject to
improvement
Below 0.19 Poor Item, to be rejected or improved by
version

Maximum Discrimination is the sum of the proportion of the upper and lower groups who answered the
item correctly. Possible maximum discrimination will occur if the half or less of the sum of the upper and
lower groups answered an item correctly.

Discriminating Efficiency is the index of discrimination divided by the maximum discrimination.

PUG = proportion of the upper group who got an item right

PLG= proportion of the lower group who got an item right

Di = discrimination index

DM – Maximum discrimination

DE = Discriminating Efficiency

Formula:

Di = PUG – PLG

DE = Di
DM

DM= PUG + PLG

Example: Eighty students took an examination in Algebra, 6 students in the upper group got the correct
answer and 4 students in the lower group got the correct answer for item number 6. Find the
Discriminating efficiency

Given:

Number of students took the exam = 80

27% of 80 = 21.6 or 22, which means that there are 22 students in the upper performing group and 22
students in the lower performing group.

PUG = 6/22 = 27%


lOMoARcPSD|11364589

PLG = 4/22 = 18%

Di = PUG- PLG

= 27%- 18%

Di= 9%

DM = PUG +PLG

= 27% + 18%

DM= 45%

DE = Di/DM

= .09/.45

DE = 0.20 or 20%

This can be interpreted as on the average, the item is discriminating at 20% of the potential of an item of
its difficulty.

Measures of Attractiveness

To measure the attractiveness of the incorrect option ( distracters) in multiple-choice tests, we count the
number if students who selected the incorrect option in both upperand lower groups. The incorrect
option is said to be effective distracter if there are more students in the lower group chose that
incorrect option than those students in the upper group.

Steps of Item Analysis

1. Rank the scores of the students from highest score to lowest score.
2. Select 27% of the papers within the upper performing group and 27% of the papers within the
lower performing group.
3. Set aside the 46% of papers because they will not be used for item analysis.
4. Tabulate the number of students in the upper group and lower group who selected each
alternative.
5. Compute the difficulty of each item
6. Compute the discriminating powers of each item
7. Evaluate the effectiveness of the distracters
lOMoARcPSD|11364589

MODULE 5: INTERPRETATION AND UTILIZATION OF TEST RESULTS

CRITERION-REFERENCED INTERPRETATION VS. NORM-REFRENCED


INTERPRETATION OF TEST RESULTS

STATISTICAL ORGANIZATION OF TEST SCORES

We shall discusse different statistical technique used in describing and analyzing test results.

1. Measures of Central Tendency (Averages)


2. Measures of Variability ( Spread of Scores
3. Measures of Relationship (Correlation)
lOMoARcPSD|11364589

4. Skewness

Measures of Central Tendency it is a single value that is used to identify the center of the data, it is taught
as the typical value in a set of scores. It tends to lie within the center if it is arranged form lowest to
highest or vice versa. There are three measures of central tendency commonly used; the mean, median
and mode.

The Mean

The Mean is the common measures of center and it also know as the arithmetic average.

Sample Mean = ∑x
n

∑= sum of the scores

X= individual scores

n = number of scores

Steps in solving the mean value using raw scores

1. Get the sum of all the scores in the distribution


2. Identify the number of scores (n)
3. Substitute to the given formula and solve the mean value

Example: Find the mean of the scores of students in algebra quiz

(x) scores in algebra

45
35
48
60
44
39
47
55
58
54
∑x = 485
n= 10

Mean = ∑x
n
= 485÷ 10
Mean = 48.5

Properties of Mean
lOMoARcPSD|11364589

1. Easy to compute
2. It may be an actual observation in the data set
3. It can be subjected to numerous mathematical computation
4. Most widely used
5. Each data affected by the extremes values
6. It is easily affected by the extremes values
7. Applied to interval level data

The Median
The median is a point that divides the scores in a distribution into two equal parts when the scores are
arranged according to magnitude, that is from lowest score to highest score or highest score to lowest
score. If the number of score is an odd number, the value of the median is the middle score. When the
number of scores is even number, the median values is the average of the two middle scores.

Example: 1. Find the median of the scores of 10 students in algebra quiz.

(x) scores of students in algebra


45
35
38
60
44
39
47
55
58
54

First , arrange the scores from lowest to highest and find the average of two middle most scores since the
number of cases in an even.
35
39
44
45
47
48
54
55
58
60

Mean = 47 + 48
2
= 47.5 is the median score

50% of the scores in the distribution fall below 47.5


lOMoARcPSD|11364589

Example 2. Find the median of the scores of 9 students in algebra quiz

(x) scores of students in algebra


35
39
44
45
47
48
54
55
58

The median value is the 5th score which is 47. Which means that 50% of the scores fall below 47.

Properties of Median

1. It is not affected by extremes values


2. It is applied to ordinal level of data
3. The middle most score in the distribution
4. Most appropriate when there are extremes scores

The Mode

The mode refers to the score or scores that occurred most in the distribution. There are classification of
mode: a) unimodal is a distribution that consist of only one mode. B) bimodal is a distribution of scores
that consist of two modes, c) multimodal is a score distribution that consist of more than two modes.

Properties of Mode

1. It is the score/s occurred most frequently


2. Nominal average
3. It can be used for qualitative and quantitative data
4. Not affected by extreme values
5. It may not exist

Example 1. Find the mode of the scores of students in algebra quiz: 34,36,45,65,34,45,55,61,34,46

Mode= 34 , because it appeared three times. The distribution is called unimodal.


lOMoARcPSD|11364589

Example 2. Find the mode of the scores of students in algebra quiz: 34,36,45,61,34,45,55,61,34,45

Mode = 34 and 45, because both appeared three times. The distribution is called bimodal

Measures of Variability

Measures of Variability is a single value that is used to describe the spread out of the scores in
distribution, that is above or below the measures of central tendency. There are three commonly used
measures variability, the range, quartile deviation and standard deviation

The Range

Range is the difference between highest and lowest score in the data

set. R=HS-LS

Properties of Range

1. Simplest and crudest measure


2. A rough measure of variation
3. The smaller the value, the closer the score to each other or the higher the value, the more
scattered the scores are.
4. The value easily fluctuate, meaning if there is a changes in either the highest score or lowest score
the value of range easily changes.

Example: scores of 10 students in Mathematics and Science. Find the range and what subject has a greater
variability?

Mathematics Science

35 35

33 40

45 25

55 47

62 55

34 35

54 45
lOMoARcPSD|11364589

36 57

47 39

40 52

Mathematics Science

HS = 62 HS =57

LS= 33 LS= 25

R = HS-LS R= HS-LS

R= 62-33 R= 57-25

R= 29 R= 32

Based form the computed value of the range, the scores in Science has greater variability. Meaning,
scores in Science are more scattered than in the scores in Mathematics

The Quartile Deviation

Quartile Deviation is the half of the differences the third quartile (Q3) and the first quartile (Q1). It is
based on the middle 50% of the range, instead the range of the entire set

Of distribution. In symbol QD = Q3-Q1


2

QD= quartile deviation

Q3= third quartile value

Q1= first quartile value

Example : In a score of 50 students, the Q3 = 50.25 and Q1 = 25.45, Find the QD

QD = Q3-Q1
2

=50.25 – 25.4
2

QD= 12.4
lOMoARcPSD|11364589

The value of QD =12.4 which indicates the distance we need to go above or below the median to include
approximately the middle 50% of the scores.

The standard deviation

The standard deviation is the most important and useful measures of variation, it is the square root of the
variance. It is an average of the degree to which each set of scores in the distribution deviates from the
mean value. It is more stable measures of variation because it involves all the scores in a distribution
rather than range and quartile deviation.

SD = √∑( x-mean)2
n-1

where ,x = individual score

n= number of score in a distribution

Example: 1. Find the standard deviation of scores of 10 students in algebra quiz. Using the given data
below.

X (x-mean)2

45 12.25

35 182.25

48 0.25

60 132.25

44 20.5

39 90.25

47 2.25

55 42.25

58 90.25

54 30.25

∑x= 485 ∑(x-mean)2 = 602.25

N= 10
lOMoARcPSD|11364589

Mean = ∑x

= 485

10

SD= √∑(x-mean)2
n-1

SD= √ 602.5
10-1

SD= √ 66.944444

SD= 8.18, this means that on the average


the

amount that deviates from the mean value= 48.5

Example 2: Find the standard deviation of the score of 10 students below. In what subject has greater
variability

Mathematics Science

35 35

33 40

45 25

55 47

62 55
lOMoARcPSD|11364589

34 35

54 45

36 57

47 39

40 52

Solve for the standard deviation of the scores in mathematics

Mathematics (x) (x-mean)2

35 82.81

33 123.21

45 0.81

55 118.81

62 320.41

34 102.01

54 98.01

36 65.61

47 8.41

40 16.81

∑x = 441 ∑(x-mean)2 = 936.9

Mean = 44.1 ∑(x-mean)2= 918

SD= √∑(x-mean)2
n-1

= √ 936.9
lOMoARcPSD|11364589

10-1

= 104.1

SD = 10.20 for the mathematics subject

Solve for the standard deviation of the score in science

Science (x) (x-mean)2

36 64

40 9

25 324

47 16

55 144

35 64

45 4

57 196

39 16

52 81

∑x= 430 ∑(x-mean)2= 918

Mean =430
10
Mean= 43

SD= √∑(x-mean)2
n-1

= √ 918
10-1
= √ 102
SD= 10.10 for science subject
lOMoARcPSD|11364589

The standard deviation for mathematics subject is 10.20 and the standard deviation foe science subject is
10.10, which means that mathematics scores has a greater variability than science scores. In other words,
the scores in mathematics are more scattered than in science.

Interpretation of Standard Deviation

When the value of standard deviation is large, on the average, the scores will be far form the mean. On
the other hand, If the value of standard deviation is small, on the average, the score will be close form the
mean.

Coefficient of Variation

Coefficient of variation is a measure of relative variation expressed as percentage of the arithmetic mean.
It is used to compare the variability of two or more sets of data even when the observations are expressed
in different units of measurement. Coefficient of variation can be solve using the formula.

( )
CV = SD x 100%
Mean

The lower the value of coefficient of variation, the more the overall data approximate to the mean or more
the homogeneous the performance of the group

Group Mean Standard deviation

A 87 8.5

B 90 10.25

CV Group A= standard deviation x 100%


Mean

= 8.5 x 100%
87
CV Group A=9.77%

CV GroupB= standard deviation x 100%


Mean
lOMoARcPSD|11364589

= 10.25 x 100%
90
CV Group B=11.39%

The CV of Group A is 9.77% and CB of Group B is 11/39%, which means that group A has homogenous
performance.

Percentile Rank
The Percentile rank of a score is the percentage of the scores in the frequency distribution which are
lower. This means that the percentage of the examinees in the norm group who scored below the score of
interest. Percentile rank are commonly used to clarify the interpretation of scores on standardized tests.

Z- SCORE
Z- score (also known as standard score) measures how many standard deviations an observations is
above or below the mean. A positive z-score measures the number of standard deviation a score is above
the mean, and a negative z-negative z-score gives the number of standard deviation a score is below the
mean.

The z-score can be computed using the formula

Z= x-µ for population


o

Z= x-mean for sample


SD
Where

X= is a raw score

0= is the standard deviation of the population

µ= is the mean of the population

SD= is the standard deviation of the sample

EXAMPLE:

James Mark’s examination results in the three subjects are as follows:

Subject Mean Standard deviation James Mark’s Grade

Math Analysis 88 10 95

Natural Science 85 5 80

Labor Management 92 7.5 94


lOMoARcPSD|11364589

EXAMPLE:A study showed the performance of two Groups A and B in a certain test given by a
researcher. Group A obtained a mean score of 87 points with standard deviation of 8.5 points, Group B
obtained a mean score of 90 points with standard deviation of 10.25 points. Which of the two group has a
more homogeneous performance?

In what subject did James Mark performed best? Very Poor?

Z math analysis = 95-88

10

Z math analysis = 0.70

Z natural science= 80-

85

Z natural Science= -1

Z labor management = 94-92

7.5

Z labor management = 0.27

James Mark had a grade in Math Analysis that was 0.70 standard deviation above the mean of the Math
Analysis grade, while in Natural Science he was -1.0 standard deviation below the mean of Natural
Science grade. He also had a grade in Labor Management that was 0.27 standard deviation above the
mean of the Labor Management grades. Comparing the z scores, James Mark performed best in
Mathematics Analysis while he performed very poor in Natural Science in relation to the group
performance.

T-score

T-score can be obtained by multiplying the z-score by 10 and adding the product to 50. In symbol, T-
score = 10z +50

Using the same exercise, compute the T-score of James Mark in Math Analysis, Natural Science and
Labor Management

T- score (math analysis) = 10 (.7) +50


lOMoARcPSD|11364589

= 57

T- score (natural science) = 10(-1)+50

= 40

T-score (labor management) = 10(0.27) +50

=52.7

Since the highest T-score us in math analysis = 57, we can conclude that James Mark performed best in
Math analysis than in natural science and labor management.

Stanine

Stanine also known as standard nine, is a simple type of normalized standard score that illustrate the
process of normalization. Stanines are single digit scores ranging form 1 to 9.

The distribution of new scores is divided into nine parts

Percent 4% 7% 12% 17% 20% 17% 12% 7% 4%


in
Stanines

2 3 4 5 6 7 8 9

Stanines 1

Skewness

Describes the degree of departures of the distribution of the data from symmetry.

The degree of skewness is measured by the coefficient of lsewness, denoted as SK and computed as,

SK= 3(mean-media)
SD
Normal curve is a symmetrical bell shaped curve, the end tails are continuous and asymptotic. The mean,
median and mode are equal. The scores are normally distributed if the computed value of SK=0

Areas Under the Normal Curve

Positively skewed when the curve is skewed to the right, it has a long tail extending off to the right but a
short tail to the left. It increases the presence of a small proportion of relatively large extreme value SK˃0
lOMoARcPSD|11364589

When the computed value of SK is positive most of the scores of students are very low, meaning to say that
they performed poor in the said examination

Negatively skewed when a distribution is skewed to the left. It has a long tail extending off to the left but a
short tail to the right. It indicates the presence of a high proportion of relatively large extreme values SK˂0.

When the computed value of SK is negative most of the students got a very high score, meaning to say that
they performed very well in the said examination

Learning References:

[17]Navaro, R.L., Santos, R.G., and Corpuz, B.B. (2019). Assessment of Learning,
OBE & PPST Based, Fourth Edition. Lorimar Publishing, Inc.
[18]Bartlett, J. (2015). Outstanding Assessment for Learning in the Classroom. Routledge, Taylor &
Francis Group
[19]Frey, N. and Fisher, D. (2011). The Formative Assessment Action Plan. USA: ASCD
[20]Lewin, and Shoemaker, B. (2011) Great Performances: Creating Classroom-Based Assessment
Task 2nd Edition. USA ASCD
[21]Ecclestone, K. et. al. (2010) Transforming Formative Assessment in Lifelong Learning. UK:
McGraw-Hill Open University Press
[22]Airasian, Peter W. (2005). Classroom Assessment Concept and Applications. New York:
McGraw-Hill Companies, Inc.
[23] Shermis, Mark D. and Di Vesta, Francis J. (2011) Classroom assessment in action. Rowman
& Littlefield Publishers, Inc.
[24]Anderson, Lorin W. (2003). Classroom Assessment: Enhancing the Quality of
Teaching Decision Making. Lawrence Erlbaum Associates, Publishers

Post-Assessment Learning Activities:


1. Collect and record the data of prelim and midterm examination of your classmate in any subject, then computed the following
and interpret
1. Mean
2. Median
3. Mode
4. Standard deviation
5. Skewness
6. Range

Instructor Saimona M. Guyo


SKC COLLEGE FACULTY

You might also like