You are on page 1of 59

GUESS WHAT?

Mechanics: In choosing a player the host will


select 2 students in the upper and lower part
of the list in the class to guess the given
basic question. Whoever got the correct
answer will proceed to the proper question
regarding the topic. A certain clue will be
flash inside an illusion and the student will
try to find out the correct answer. Players
have only 10 seconds to answer and
whoever fails with that other student may
steal the chance.
How many EYELASH
does Spongebob have?
A process of collecting and analyzing evidence to support the meaningfulness and
usefulness of the test.

VALIDATION
What is the color of the
HEART in GMA 7?
It refers to the consistency of the scores obtained — how consistent they are for each individual
from one administration of an instrument to another and from one set of items to another.

RELIABILITY
How many BIRDS are
there in the Nestle logo?
It is a process which examines student responses to individual test items (questions)
in order to assess the quality of those items and of the test as a whole.

ITEM ANALYSIS
ITEM ANALYSIS
AND VALIDATION
CHAPTER 3

Start Table of contents Back Next


It is a statistical technique which
ITEM is used for selecting and rejecting
the items of the test on the basis
ANALYSIS of their difficulty value and
discriminated power.

Start Table of contents Back Next


1. Try-out phase
a. Test Taking
PHASES OF b. Arrang the score from highest to lowest

ITEM c. Determine the upper and lower group.

ANALYSIS 2. Item analysis phase (level of


difficulty)
3. Item revision phase

Start Table of contents Back Next


 Evaluates the quality of each item. PURPOSE OF
 May suggest ways of improving the ITEM
measurement of a test. ANALYSIS

Start Table of contents Back Next


3 IMPORTANT
CHARACTERISTICS OF AN ITEM

1. Item Difficulty
2.Discrimination Index
3.Distractor Analysis
Start Table of contents Back Next
Number of students who are able to answer
the item correctly divided by the total number
of students.
Item Difficulty = no. of students
with correct answer/total no. of
students.
ITEM
DIFFICULTY
Start Table of contents Back Next
EXAMPLE
What is the difficulty index of an item if 25 students are
unable to answer it correctly while 75 answered it
correctly?
Total number of student = 100
Number of student who answer correctly = 75

Item difficulty = 75/100 = ?


0.75 or 75%

Start Table of contents Back Next


Range Of Difficulty Iterpretation Action
Revise Or
0 - 0.25 Difficult
Discard

0.26 - 0.75 Right-Difficulty Retain

Revise Or
0.76 - Above Easy
Discard

Start Table of contents Back Next


It is the difference of the proportion on the "correct
response" between the students in upper group (DU)
and the proportion of the students in lower group
(DL).. • 27% of upper group (DU) • 27% of lower
group (DL)

INDEX
Start Table of contents Back Next
 Positive Discrimination Index
 Negative Discrimination Index
DISCRIMINATION
 Zero Discrimination Index INDEX

Start Table of contents Back Next


27% UG got the correct answer
27% LG got an incorrect answers
How do we get the 27% Upper and Lower Group of the Class?

Formula: no. of class X .27

Total no. Of students = 54


27% = 14. 58 or 15

UG= 15 students 27%


LG = 15 students 27%
Find the Index of Discrimination.

Formula: DU - DL 27%

UG had DI of 0.60 or 60% 27% LG had


DI of 0.20 or 20% DU= 0.60

Index of discrimination = 0.60-0.20


DL= 0.20 =0.40.
RULE OF THUMB

Index Range Iterpretation Action


-1.0 - 0.50 Can Descriminate But Discard
Item Is Questionable

-.55 - 0.45 Non-Descriminating Revise


0.46 – 1.0 Descriminating Item Include

Start Table of contents Back Next


INDEX OF
DISCRIMINATION RANGE

-1.0 to 1.0

LG got UG got the


correct answers. correct answers.

Start Table of contents Back Next


1. Construct test items

2. Conduct an item try out

a) Test taking

STEPS: b) Arrange Scores

c) Determine the lower and upper group.

3. Construct an item analysis table

Start Table of contents Back Next


ITEM 1
A B C D
TOTAL
12 25 13 10

3 10 2 1 Upper Group

5 6 5 1 Lower Group
1. Compute the item difficulty
STEPS IN ITEM 2. Perform item discrimination
ANALYSIS
3. Analysing distractors

Start Table of contents Back Next


FORMULAS
Item difficulty = no. of students with
correct answer/total no. of students.
Discrimination index = DU-DL

Start Table of contents Back Next


Range Of Difficulty Iterpretation Action
0 - 0.25 Difficult Revise Or Discard
0.26 - 0.75 Right-Difficulty Retain
0.76 - Above Easy Revise Or Discard

Index Range Iterpretation Action


-1.0 - 0.50 Can Descriminate But Discard
Item Is Questionable

-.55 - 0.45 Non-Descriminating Revise


0.46 – 1.0 Descriminating Item Include

Start Table of contents Back Next


1. If the Upper Group and Lower group did
not choose the distractor
PRINCIPLES OF
2. If the distractor is more attractive to upper ANALYSING
group than the lower group
DISTRACTOR
3. If both the upper group and lower group
have total no. of answers in the distractor.

Start Table of contents Back Next


BASIC ITEM ANALYSIS STATISTICS

The Michigan State University


Measurement and Evaluation
Department reports a number of item
statistics which aid in evaluating the
effectiveness of an item.

Start Table of contents Back Next


BASIC ITEM ANALYSIS STATISTICS

The first of these is the index of difficulty


which MSU defines as the proportion of
total group who got the item wrong. Thus
a high index indicates a difficult item and
low index indicates easy item.

Start Table of contents Back Next


BASIC ITEM ANALYSIS STATISTICS

Whichever index is selected is shown as the


INDEX DIFFICULTY on the item analysis
print-out. Classroom achievements test, most test
constructors desire items with indicates of
difficulty no lower than 20 nor higher than 80
with an average index of difficulty from 30 or 40
to a maximum of 60.

Start Table of contents Back Next


Is the difference between the
proportion of the upper group
INDEX OF
DISCRIMINATION who got an item right and the
proportion of the lower group
who got the item right.

Start Table of contents Back Next


D= RU + RL
½T INDEX OF
Where DISCRIMINATING
P – percentage who answered the item correctly POWER
(Index of Difficulty)
R – number who answered correctly
T – total number who tried the item.

Start Table of contents Back Next


The smaller the percentage figure the more
difficult the item
INDEX OF
Estimate the item discriminating power using the
formula below: DISCRIMINATING
POWER
D = RU - RL = 6- 2 = 0.40
½T 10

Start Table of contents Back Next


MORE SOPHISTICATED
DISCRIMINATION INDEX

Start Table of contents Back Next


ITEM
DISCRIMINATION
Refers to the ability of an item to differentiate
among students on the basis of how well they
know the material being tested.

Start Table of contents Back Next


Various hand calculation
procedures have traditionally been
used to compare item responses to
total test scores using

Computerized analyses provide more


accurate assessment of the
discrimination power of items because
they take into account responses of all
students rather than just high and low
scoring groups.
Start Table of contents Back Next
The Item discrimination index provided by
ScorePak® is a Pearson Product Moment
correlation between student responses to a
particular item and total scores on all other
items on the test. It provides an estimate of the
degree to which an individual item is measuring
the same thing as the rest of the items.

Start Table of contents Back Next


Discrimination
Coefficients
• Point biserial – The point biserial (rpbis)
correlation is used to find out if the right
people are getting the items right, and
how much predictive power the item has
and how it would contribute to
predictions.

Start Table of contents Back Next


 Discrimination index reflects the degree to
which an item and the test as a whole are
measuring a unitary ability or attribute,
values of the coefficient will tend to be
lower for tests measuring a wide range of
content.

 Items with negative indices should be examined to


determine why a negative value was obtained.
Tests with high internal consistency consist of
items with mostly positive relationships with total
test score.

Start Table of contents Back Next


ScorePak® Classifies
Item Discrimination
“Earth is the third planet
“Venus has a beautiful name from the Sun and the only
“good” if the index “fair” if it is
and is actually the second
“poor” if it is below
one that harbors life in the
is above .30 between .10theand
planet from Sun”.30; .10.
System”

Start Table of contents Back Next


At the end of the Item Analysis report, test items are
listed according their degrees of difficulty (easy,
medium, hard) and discrimination (good, fair, poor).
These distributions provide a quick overview of the
test, and can be used to identify items which are not
performing well and which can perhaps be improved
or discarded.

Start Table of contents Back Next


Process of collecting and
analyzing evidence to support the
meaningfulness and usefulness of VALIDATION
the test.

Start Table of contents Back Next


Is the extent to which a test measures
what it purports to measure or as a
referring to the appropriateness,
correctness, meaningfulness and VALIDITY
usefulness of the specific decisions a
teacher makes based on the test results.

Start Table of contents Back Next


• Content - related evidence of validity.
3 TYPES OF
• Criterion - related evidence of validity.
EVIDENCE OF
VALIDITY • Construct - related evidence of validity.

Start Table of contents Back Next


Refers to the content and format
CONTENT-RELATED
of the instrument. EVIDENCE OF
VALIDITY

Start Table of contents Back Next


Refers to the relationship
between scores obtained using
CRITERION-
the instrument and scores
RELATED EVIDENCE
obtained using one or more OF VALIDITY
other tests.

Start Table of contents Back Next


Refers to the nature of
psychological construct or CONSTRUCT-
characteristics being RELATED EVIDENCE
measured by the test. OF VALIDITY

Start Table of contents Back Next


PROCEDURE FOR DETERMINING
CONTENT VALIDITY
• Teacher writes out objectives of the test.

• Teacher will give these to the experts together with the


test.

• The expert look at the objectives, read the items in the


test.

• Place a checkmark in front of each questions or item


that they may feel does not measure one or more
objectives.

Start Table of contents Back Next


PROCEDURE FOR DETERMINING
CONTENT VALIDITY
• They also place a checkmark in front of each
objectives not assessed by any item in the test.

• Teacher rewrites any item so checked and and


resubmit to experts.

• This continues until experts approved of all items and


also until the experts agree that all of the objectives
are sufficiently covered by the test.

Start Table of contents Back Next


The teacher usually compares scores
TO OBTAIN EVIDENCE on the test in question with the scores
OF CRITERION- on some other independent criterion
RELATED EVIDENCE
test which presumably has already
VALIDITY
high validity.

Start Table of contents Back Next


TYPE OFRITERION -
related evidence validity

a. Concurrent Validity
b. Predictive Validity

Start Table of contents Back Next


  Grade Point Average
TEST SCORE VERY GOOD GOOD NEEDS
IMPROVEMENT
HIGH 20 10 5
AVERAGE 10 25 5
LOW 1 10 14

EXPENTANCY
TABLE

Start Table of contents Back Next


RELIABILITY

Reliability refers to the consistency of the scores


obtained — how consistent they are for each
individual from one administration of an instrument
to another and from one set of items to another.

Start Table of contents Back Next


VALIDITY AND REALIBILITY

Are related concepts. If an instrument is


unreliable, it cannot yet valid outcomes. As
reliability improves, validity may improve
(or may not).

Start Table of contents Back Next


However, if an instrument is shown
scientifically valid then it is almost certain
that it is also reliable. Item reliability is the
consistency of a set of items (variables);
that is to what extent they measure the
same thing.

Start Table of contents Back Next


When a set of items are consistent, they can make a
measurement scale such as a sum scale.

The table below is a standard followed almost


universally in educational tests and measurement
REABILITY Interpretation
.90 and above Excellent reliability; at the level of the best
standardized tests.
   
.80-90 Very good for a classroom test
 
 
.70-80 Good for a classroom test; in the range of most.
There are probably a few items which could be
  improved.
 
When a set of items are consistent, they can make a
measurement scale such as a sum scale.

.60-70 Somewhat low. This test needs to be


supplemented by other measures. (e.g., more
  tests) to determine grades. There are probably
some items which could be improved.
 
.50-60 Suggests needs for revision of test, unless it is
quite short (ten or fewer items). The test
  definitely needs to be supplemented by other
measures (e.g., more tests) for grading.
 
.50 and below Questionable reliability. This test should not
contribute heavily to the course grade, and it
  needs revision.
 

You might also like