Characteristics of A Good Test

CHARACTERISTICS OF A
GOOD TEST
Learning Outcomes: At the end of the
chapter, you must have:
• enumerated the different ways of establishing
validity and reliability of different assessment
tools
• identified the different factors affecting the
validity and reliability of the test
• computed and interpreted the validity and
reliability coefficient
VALIDITY
• A good test must first of all be valid.
• Validity refers to the extent to which a test measures
what it purports to measure. This is related to the
purpose of the test. If the purpose of the test is to
determine the competency in adding two-digit
numbers, then the test items will be about addition of
these two-digit numbers. Thus, if the objective matches
the test items prepared, the test is said to be valid.
There are different ways of establishing
validity.
• FACE VALIDITY
• is done by examining the physical appearance of the instrument.
• CONTENT VALIDITY
• is done through a careful and critical examination of the objectives
of assessment so that it reflects the curricular objectives.
For instance, the teacher wishes to evaluate a test in English. She
requests experts in English to validate if the test items measure
knowledge, skills and values it supposed to measure as stated in the
course content/syllabus.
• CRITERION-RELATED VALIDITY
• is established statistically such that a set of scores revealed by
the measuring instrument is correlated with the scores
obtained in another external predictor or measure. It has two
types: concurrent and predictive validity.
• Concurrent validity – describes the present status of the individual
by correlating the sets of scores obtained from two measures
given concurrently.
For instance, the teacher wants to validate the Mathematics
achievement test to a group of mathematics students. The result of
the test is correlated with an acceptable Mathematics test which
has been previously proven as valid. If the correlation is “high” the
Mathematics test that he constructed is valid.
• Predictive validity – describes the future performance of
an individual by correlating the sets of scores obtained
from two measures given at a longer time interval.
For instance, the teacher wishes to estimate how well a
student may do in the graduate courses on the bases of how
well he has done on the test he has undertaken in his
undergraduate courses. The criterion measure against which
the test scores are validated and obtained are available after
a long period of interval.
• CONSTRUCT-RELATED VALIDITY
• This is the extent to which the test measures a theoretical and
unobservable variable quality such as understanding, math
achievement, performance anxiety and the like, over a period
of time on the basis of gathering evidence. It is established
through intensive study of the test or measurement
instrument using convergent/divergent validation and factor
analysis.
• Convergent validity – is a type of construct validation wherein a
test has high correlation with another test that measures the
same construct.
• Divergent validity – is a type of construct validation wherein
a test has low correlation with a test that measures a
different construct. In this case, a high validity occurs only
when there is a low correlation coefficient between the tests
that measure different traits. A correlation coefficient in this
instance is also called validity coefficient.
• Factor analysis – is another method of assessing the
construct validity of a test using complex statistical
procedures conducted with different procedures.
Factors Affecting Validity
• Poorly constructed items • Inadequate time limit
• Unclear directions • Inappropriate level of
• Ambiguous test items difficulty
• Too difficult vocabulary
• Unintended clues
• Improper arrangement
• Complicated syntax
of test items
RELIABILITY
• Another characteristic of a good test is reliability.
Reliability refers to the consistency of test scores.
Test scores may vary under different conditions.
The reliability of test scores is usually reported by
a reliability coefficient. A reliability coefficient is
also a correlation coefficient.
TEST-RETEST METHOD
• In this method, the same test is administered twice to
the same group of students with any time interval
between tests. The result of the test scores are
correlated using the Pearson Product Correlation
Coefficient or Spearman rho formula and this
correlation provides a measure of stability. This indicates
how stable or consistent the test result over a period of
time. The formulae are:
• Pearson Formula
where is the first set of scores, is the second set of scores and is
the number of cases.
• Spearman rho Formula
where stands for Spearman rho; is the sum of the squared difference
between ranks, and is the number of cases.
• EQUIVALENT FORM
• It is also known as Parallel or Alternate forms. In this
method, two different but equivalent forms of the test is
administered to the same group of students with a close
time interval. The two forms of the test must be
constructed that the content type of test item, difficulty,
and instruction of administration are similar but not
identical.
For instance, in Form A item, “How many meters are there in 8
kilometers?” In Form B item, “How many kilometres are there in 8 000
meters?” The results of the test scores are correlated using the Pearson
Product Correlation Coefficient and this correlation provides a measure
of equivalence of the tests.
•TEST-RETEST WITH EQUIVALENT FORMS
METHOD
•It is done by giving equivalent forms of test with
increased time interval between forms. The
results of the test scores are correlated using the
Pearson Product Correlation Coefficient and this
correlation provides measures of stability and
equivalence of the tests.
• SPLIT-HALF METHOD
• In this method, the test administered once and the
equivalent halves of the test is scored. The common
procedure is to divide the test into odd-numbered and even-
numbered items. The two halves of the test must be similar
but not identical in content, number of items and difficulty.
This provides two scores for each student. The scores
obtained in the two halves are correlated using Pearson. The
result is reliability coefficient for a half test. Since the
reliability holds only for a half test, the reliability coefficient
for a whole test is estimated by using the Spearman-Brown
formula. The Spearman-Brown formula as follows:
Where: reliability of whole test
reliability of half test
• This correlation coefficient provides a measure of

internal consistency. It indicates the degree to which
consistent results are obtained from two halves of the
test.
• KUDER-RICHARDSON METHOD
• In this method, the test is administered once, the total
test is scored then the proportion/ percentage of the
students passing and not passing a given item is
correlated. It has two types: and .
• KUDER-RICHARDSON 20 () is applicable only in
situations where students’ responses are scored
dichotomously, and therefore, is most useful with
traditional test items that are scored as right and wrong,
true or false, and yes or no type. It uses the formula:
Where: number of items
proportion of the students who got the item correctly (difficulty
index)
variance of the total score

where number of items; summation of the square of and summation
of
• KUDER-RICHARDSON 21 (KR – 21) is not limited to test
items that are scored dichotomously. It uses the formula:
where number of items

mean value
variance of the total score
Factors Affecting Reliability of a Test
•Length of the test
•Item difficulty
•Objective scoring
•Heterogeneity of the student group
•Limited time
• RELIABILITY COEFFICIENT
Reliability coefficient is a measure of the amount of error

associated with the test scores.
• Description of Reliability Coefficient
• The range of the reliability coefficient is form 0 to 1.0
• The acceptable range value is 0.60 or higher.
• The higher the value of the reliability coefficient, the more
reliable the overall test scores is.
• Interpretation of Reliability Coefficient
• The group variability will affect the size of the reliability coefficient.
Higher coefficient results from heterogeneous groups than from the
homogeneous groups. As group variability increases, reliability goes
up.
• Scoring reliability limits test score reliability. If tests are scored
unreliably, error is introduced. This will limit the reliability of the test
scores.
• Test length affects test score reliability. As the length increases, the
test’s reliability tends to go up.
• Item difficulty affects test score reliability. As test items become very
easy or very difficult, the test’s reliability goes down.
Reliability Interpretation
Coefficient
0.91 –Level
1.00 ofExcellent
Reliability Coefficient
reliability. Very ideal for a classroom test.
0.81 – 0.90 Very high reliability. Very good for a classroom test.
0.71 – 0.80 High reliability. Good for a classroom test. There are probably
few items needs to be improved.
0.61 – 0.70 Moderate reliability. The test needs to be supplemented by
other measures (more test) to determine grades.
0.51 – 0.60 Low reliability. Suggested need for revision of the test, unless it
is quite short (ten or fewer items). Needs to be supplemented
by other measures (more test) to determine grades.
0.50 and below Questionable reliability. This test should not contribute heavily
to the course grade and it needs revision.
• Example 1. Prof. Santos conducted a test to his 10 students in Elementary Statistics
class twice after one-day interval. The test given after one day is exactly the same
test given the first time. Scores below were gathered in the first test (X) and second
test (Y). Using test-retest method, is the test reliable? Show the complete solution
using the Pearson r formula.
Student First Test (X) Second Test (Y)
1 36 38
2 26 34
3 38 38
4 15 27
5 17 25
6 28 26
7 32 35
8 35 36
9 12 19
10 35 38
Solution: Find the and
Student First Test Second Test
(X) (Y)
1 36 38 1 368 1 296 1 444
2 26 34 884 676 1 156
3 38 38 1 444 1 444 1 444
4 15 27 405 225 729
5 17 25 425 289 625
6 28 26 728 784 676
7 32 35 1 120 1 024 1 225
8 35 36 1 260 1 225 1 296
9 12 19 228 144 361
10 35 38 1 330 1 225 1 444
Analysis: The reliability coefficient using the Pearson
which means that it has an excellent reliability. The
scores of the 10 students conducted twice with one-day
interval are consistent. Hence, the test is very ideal for
a classroom test.
Compute the reliability coefficient of the same data using
Spearman rho. Is the test reliable?
Solution: Rank the scores in the first test , then rank the scores in the second test Get
the difference between each rank to get . Then multiply by itself to get
Student First Test Second Test Rank of Rank of Difference Square of the
between Difference
Ranks
1 36 38 2 2 0 0
2 26 34 7 6 1 1
3 38 38 1 2 -1 1
4 15 27 9 7 2 4
5 17 25 8 9 -1 1
6 28 26 6 8 -2 4
7 32 35 5 5 0 0
8 35 36 3.5 4 -0.5 0.25
9 12 19 10 10 0 0
10 35 38 3.5 2 1.5 2.25
• Analysis: The reliability coefficient using the
Spearman rho = 0.92 which means that it has an
excellent reliability. The scores of the 10 students
conducted twice with one-day interval are
consistent. Hence, the test is very ideal for a
classroom test.
Example 2: Prof. Geronimo conducted a test to her 10 students in Biology class
twice after one-week interval. The test given after one week is the parallel form of
the test during the first time the test was conducted. Scores below were gathered
in the first test and second test or parallel test Using equivalent or parallel form
method, is the test reliable? Show the complete solution using the Pearson
formula. Student First Test Parallel Test
1 12 20
2 20 22
3 19 23
4 17 20
5 25 25
6 22 20
7 15 19
8 16 18
9 23 25
10 21 24
Solution: Find the and
Student First Test Parallel Test
(X) (Y)
1 12 20 240 144 400
2 20 22 440 400 484
3 19 23 437 361 529
4 17 20 340 289 400
5 25 25 625 625 625
6 22 20 440 484 400
7 15 19 285 225 361
8 16 18 288 256 324
9 23 25 575 529 625
10 21 24 504 441 576
Analysis: The reliability coefficient using the Pearson
which means that it has a high reliability. The scores of
the 10 students conducted twice with one-week interval
are consistent. Hence, the test is good for a classroom
test but there are probably few items needs to be
improved.
Note: Compute the reliability coefficient of the same data using Spearman
rho. Is the test reliable?
Solution: Rank the scores in the first test , then rank the scores in the second test Get the
difference between each rank to get . Then multiply by itself to get
Student First Test Second Test Rank of Rank of Difference between Square of the
Ranks Difference
1 12 20 10 7 3 9
2 20 22 5 5 0 0
3 19 23 6 4 2 4
4 17 20 7 7 0 0
5 25 25 1 1.5 -0.5 0.25
6 22 20 3 7 -4 16
7 15 19 9 9 0 0
8 16 18 8 10 -2 4
9 23 25 2 1.5 0.5 0.25
10 21 24 4 3 1 1
• Analysis: The reliability coefficient using the
Spearman rho = 0.79 which means that it has a
high reliability. The scores of the 10 students
conducted twice with one-week interval are
consistent. Hence, the test is good for a classroom
test but there are probably few items needs to be
improved.
Example 3: Prof. Quinto conducted a test to her 10 students in Filipino class. The test
was given only once. The scores of the students in odd (O) and even (E) items below
were gathered. Using split-half method, is the test reliable? Show the complete
solution using Pearson r and Spearman-Brown Formula.
Student Odd Even
1 15 20
2 19 17
3 20 24
4 25 21
5 20 23
6 18 22
7 19 25
8 26 24
9 20 18
10 18 17
Student First Test Parallel
(X) Test
(Y)
1 15 20 300 225 400
2 19 17 323 361 289
3 20 24 480 400 576
4 25 21 525 625 441
5 20 23 460 400 529
6 18 22 396 324 484
7 19 25 475 361 625
8 26 24 624 676 576
9 20 18 360 400 324
10 18 17 306 324 289
Step 2: Use Spearman-Brown Formula to get the reliability of the whole test.
• Analysis: The reliability coefficient using Spearman Brown Formula is 0.50 which
means it is questionable reliability. Hence, the test items should be revised.
Prof. Madela administered a 40-item test in English for his Grade VI pupils
in Mayondon Elementary School. Below are the scores of 15 pupils, find the
reliability using the Kuder-Richardson 21 formula.
Student Score
1 16
2 25
3 35
4 39
5 25
6 18
7 19
8 22
9 33
10 36
11 20
12 17
13 26
14 35
15 39
Solve the variance and the mean of the scores using the table below.
Student Score
1 16 256
2 25 625
3 35 1 225
4 39 1 521
5 25 625
6 18 324
7 19 361
8 22 484
9 33 1 089
10 36 1 296
11 20 400
12 17 289
13 26 676
14 35 1 225
15 39 1 521
Variance:
Mean:
Solve the reliability coefficient using the Kuder-Richardson 21 formula
Analysis: The reliability coefficient using KR – 21 formula is 0.90 which

means that the test has a very high reliability. Meaning, the test is very
good for a classroom test.
Ms. Gonzaga administered a 20-item true or false test for her Grade VIII students in Los
Baňos National High School. Below are the scores of 40 students, find the reliability using
the Kuder-Richardson 20 formula.

Characteristics of A Good Test

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Characteristics of A Good Test

Uploaded by

Copyright:

Available Formats

CHARACTERISTICS OF A

• This correlation coefficient provides a measure of

variance of the total score

where number of items

Reliability coefficient is a measure of the amount of error

Analysis: The reliability coefficient using KR – 21 formula is 0.90 which

You might also like