Professional Documents
Culture Documents
Course Outcomes:
distinguish the uses of item analysis, validity, reliability, difficulty index, discrimination index
(PO-Cc)
determine the validity and reliability of given test items (PO-Dd) (PO-Ec)
INTRODUCTION
This lesson covers the purpose of validation in order to determine the characteristics of the
whole test itself, namely, the validity and reliability of the test. (Navarro, Rosita, et.al.,2019)
Preliminary Questions:
II. CONNECT
After performing the item analysis and revising the items which need revision, the next step
is to validate the instrument. The purpose of validation is to determine the characteristics of the
whole test itself, namely validity and reliability of the test.
What is validation?
What is validity?
Validity refers to how well a test measures what it is purported to measure. It refers also to
the appropriateness, correctness, meaningfulness and usefulness of the specific decisions a
teacher makes based on the test results.
Why is it necessary?
While reliability is necessary, it alone is not sufficient. For a test to be reliable, it also needs
to be valid. For example, if your scale is off by 5 lbs, it reads your weight every day with an excess
of 5lbs. The scale is reliable because it consistently reports the same weight every day, but it is
not valid because it adds 5lbs to your true weight. It is not a valid measure of your weight.
A teacher who conducts test validation might want to gather different kinds of evidence.
There are essentially three main types of evidence that may be collected:
1. Construct Validity is used to ensure that the measure is actually measure what it is
intended to measure (i.e. the construct), and not other variables. Using a panel of “experts” familiar
with the construct is a way in which this type of validity can be assessed. The experts can examine
the items and decide what that specific item is intended to measure. Students can be involved in
this process to obtain their feedback.
Example: A women’s studies program may design a cumulative assessment of learning
throughout the major. The questions are written with complicated wording and phrasing. This can
cause the test inadvertently becoming a test of reading comprehension, rather than a test of
women’s studies. It is important that the measure is actually assessing the intended construct,
rather than an extraneous factor.
2. Criterion-Related Validity is used to predict future or current performance - it correlates
test results with another criterion of interest.
Example: If a physics program designed a measure to assess cumulative student learning
throughout the major. The new measure could be correlated with a standardized measure of
ability in this discipline, such as an ETS field test or the GRE subject test. The higher the
correlation between the established measure and new measure, the more faith stakeholders can
have in the new assessment tool.
5. Sampling Validity (similar to content validity) ensures that the measure covers the broad range
of areas within the concept under study. Not everything can be covered, so items need to be
sampled from all of the domains. This may need to be completed using a panel of “experts” to
ensure that the content area is adequately sampled. Additionally, a panel can help limit “expert”
bias (i.e. a test reflecting what an individual personally feels are the most important or relevant
areas).
Example: When designing an assessment of learning in the theatre department, it would not
be sufficient to only cover issues related to acting. Other areas of theatre such as lighting, sound,
functions of stage managers should all be included. The assessment should reflect the content
area in its entirety.
What is reliability?
Reliability is the degree to which an assessment tool produces stable and consistent
results or the consistency of the scores obtained.
Types of Reliability
1. Test-retest reliability is a measure of reliability obtained by administering the same test
twice over a period of time to a group of individuals. The scores from Time 1 and Time 2 can then
be correlated in order to evaluate the test for stability over time.
Example: A test designed to assess student learning in psychology could be given to a group
of students twice, with the second administration perhaps coming a week after the first. The
obtained correlation coefficient would indicate the stability of the scores.
The following table is a standard followed almost universally in educational test and
measurement.
Reliability Interpretation
0.90 and above Excellent reliability; at the level of the best standardized tests.
(Very high reliability)
0.80 – 0.90 Very good for a classroom test.
(High reliability)
Good for a classroom test; in the range of most. There are
0.70 – 0.80 probably a few items which could be improved.
(Average/moderate reliability)
Somewhat low. The test needs to be supplemented by other
measures (e.g., more tests) to determine grades. There are
0.60 – 0.70 probably some items which could be improved.
(Low reliability)
Suggests need for revision of test, unless it is quite short
(ten or fewer items). The test definitely needs to be
supplemented by other measures (e.g., more tests) for
0.50 – 0.60
grading.
(Very low reliability)
Questionable reliability. This test should not contribute heavily
0.50 or below to the course grade, and it needs revision.
Example. Spearman rho Computation of the First and Second Administration of Achievement
Test in English (Artificial Data)
Ranks Difference
Students X Y Rx Ry D D2
1 90 70 2.0 7.5 -5.5 30.25
2 43 31 13.0 12.5 0.5 0.25
3 84 79 6.5 3.0 3.5 12.25
4 86 70 4.5 7.5 -3.0 9.00
5 55 43 11.0 10.5 0.5 0.25
6 77 70 8.5 7.5 1.0 1.00
7 84 75 6.5 4.5 2.0 4.00
8 91 88 1.0 1.0 0.0 0.00
9 40 31 14.0 12.5 1.5 2.25
10 75 70 10.0 7.5 2.5 6.25
11 86 80 4.5 2.0 2.5 6.25
12 89 75 3.0 4.5 -1.5 2.25
13 48 30 12.0 14.0 2.0 4.00
14 77 43 8.5 10.5 -2.5 4.00
Total ∑D 2 = 82.00
Calmorin, 2004
Solution:
rs = 1 - 6 ∑D2
N3 - N
= 1 - 6 ( 82 )__
143 - 14
= 1 - 492__
2744 - 14
= 1 - 492__
2730
= 1 - 0.18021978
Further readings.
https://study.com/academy/lesson/test-retest-reliability-coefficient-examples-lesson-quiz.html
Where
k = number of questions
pj = number of people in the sample who answered question j correctly
qj = number of people in the sample who didn’t answer question j correctly
σ2 = variance of the total scores of all the people taking the test = VARP(R1) where R1 = array
containing the total scores of all the people taking the test.
Values range from 0 to 1. A high value indicates reliability, while too high a value (in excess of .90)
indicates a homogeneous test.
The values of p in row 18 are the percentage of students who answered that question correctly –
e.g. the formula in cell B18 is =B16/COUNT(B4:B15). The values of q in row 19 are the percentage
of students who answered that question incorrectly – e.g. the formula in cell B19 is =1–B18. The
values of pq are simply the product of the p and q values, with the sum given in cell M20.
We can calculate ρKR20 as described in Figure 2.
The value ρKR20 = 0.738 shows that the test has high reliability.
http://www.real-statistics.com/reliability/internal-consistency
Further readings.
http://korbedpsych.com/LinkedFiles/CalculatingReliability.pdf
III. COLLABORATE
Practice exercise. Test-retest Method
Solve for the reliability applying the Spearman rank correlation coefficient or Spearman rho.
Ranks Difference
Students X Y Rx Ry D D2
1 85 70
2 43 38
3 55 43
4 77 70
5 84 75
6 88 88
7 45 40
8 75 70
9 86 80
10 89 75
Total ∑D 2 =
IV. CREATE
I. Multiple Choice.
Direction: Read each problem carefully and determine the correct answer from the given
options. Check the box from the choices.
Calmoring, 2004
II. Problem Solving. Use extra paper for the solutions.
Using the data below, find out if the test is reliable using the test-retest method administered to
15 students as pilot sample. Use extra papers for the solutions.
Ranks Difference
Students X Y Rx Ry D D2
1 28 29
2 83 83
3 44 45
4 77 49
5 80 79
6 95 95
7 88 87
8 45 45
9 83 84
10 79 80
11 82 83
12 25 25
13 77 79
14 38 39
15 70 72
Total ∑D 2 =
V. ASSIGNMENT
Find out if there is internal consistency in the responses of the 12 students as pilot sample in
a 10-item test in Mathematics. Show your solutions. Use extra paper.
Students
Items 1 2 3 4 5 6 7 8 9 10 11 12 f pi qi p i qi
1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1 1 1 1 1
4 0 1 1 1 1 1 1 1 1 1 1 0
5 0 1 1 1 1 1 1 1 1 1 1 0
6 1 0 1 1 1 1 1 1 0 1 1 0
7 1 0 1 0 1 1 1 1 0 1 0 0
8 1 0 1 0 1 1 1 1 0 0 0 0
9 0 0 0 0 1 1 1 1 0 0 0 0
10 0 0 0 0 0 0 1 1 0 0 0 0
Total
References:
Books:
Navarro, Rosita L., Santos, Rosita G., Corpuz, Brenda B. (2019). Assessment of Learning 1.
Fourth Edition. Quezon City: Lorimar Publishing, Inc.
Navarro, Rosita L., Santos, Rosita G., Corpuz, Brenda B. (2019). Assessment of Learning 1. OBE
and PPST Based. Fourth Edition. Quezon City: Lorimar Publishing, Inc.
Navarro, Rosita L., Santos, Rosita G., Corpuz, Brenda B. (2017). Assessment of Learning 1. OBE
and K12 Based. Third Edition. Quezon City: Lorimar Publishing, Inc.
De Guzman, Estefania S. and Adamos, Joel L. (2015). Assessment of Learning 1. Quezon City:
Adrian Publishing Co., Inc.
Gonzales, Jacobo O. and Nocon, Rizaldi C. (2015). Essential Statistics. 2015 Edition. Quezon
City: MaxCor Publishing House Inc.
Corpuz, Brenda B. and Salandanan, Gloria G. (2015). Principles of Teaching 2 with TLE. Lorimar
Publishing, Inc.
Navarro, Rosita L. et al. (2012). Assessment of Learning Outocmes. Second Edition. Lorimar
Publishing, Inc.
Buendicho, Flordeliza C. (2011). Assessment of Learning 1. Rex Bookstore
Santos, Rosita de-Guzman , Ph.D. (2007).Assessment of Learning 2. Lorimar Publishing , Inc.,
Quezon City.
Calmorin, Laurentina P. (2004). Educational Research Measurement & Evaluation. (3rd ed.)
National Bookstore
Padua, Roberto N. et.al. (1997). Educational Evaluation and Measurement. Katha Publishing
Coy.
Calmorin, Laurentina P. (1994). Educational Research Measurement & Evaluation. National
Bookstore
Calderon and Gonzales. (1993). Measurement and Evaluation. National Bookstore.
Website:
https://chfasoa.uni.edu/reliabilityandvalidity.htm
https://study.com/academy/lesson/test-retest-reliability-coefficient-examples-lesson-quiz.html
http://korbedpsych.com/LinkedFiles/CalculatingReliability.pdf
http://www.real-statistics.com/reliability/internal-consistency-reliability/kuder-richardson-formula-20/