Reliability and Validity of Achievement Tests

TEST VALIDITY AND RELIABILITY
Notes for Trainee Teachers
by
Dr Richard Wambua
Department of Educational Psychology
Kenyatta University
© JULY 2022
Page 1 of 6
OVERVIEW
Tests are prepared to measure educational achievement, which is a very important step in the
educational process. This means that there is concern that the test should be of acceptable quality.
The quality of the test depends on its validity and reliability. These two aspects of the test will be
considered in this topic.
Validity
is the effectiveness of a test in achieving a specified purpose. There are four different types of
validity.
Predictive Validity
is the extent to which scores in a test are consistent with criterion scores, for example educationist
in Kenya are concerned with the predictive validity if the KCSE with respect to the criterion of
university performance.
Concurrent Validity
in which the criterion and the predictor measures are taken at the same time. This means that there
is no time gap between the two measures, unlike in the case of predictive validity where the two
measures have a definite time gap. Both types of validities are measured using correlation
coefficients. The term construct refers to a theoretical conceptualization about an aspect of human
behaviour that cannot be measured directly.
Construct Validity
Construct Validity is the degree to which a test measures what it claims to measure. The ability is
defined by a theory. This type of validity is estimated by showing that the test scores obtained in a
test are correlated with scores obtained from tests of other constructs in ways that a theory would
predict. This type of validity is assessed using advanced forms of analysis such as factor analysis.
Page 2 of 6
Content Validity
which is concerned with the extend of curriculum coverage by a test. This is the most powerful
evidence of validity that a classroom teacher can provide. Development of a table of specifications
is done to ensure that a test is a valid measure of instructional objectives and the course content.
This means that a test will be in systematic order, and will also provide a representative sample of
student performance in each of the areas covered in the curriculum. The table of specification is a
two way chart that relates course content to instructional objectives by giving the relative emphasis
to be given to each learning outcome. Details on construction of this Table are given in the
succeeding chapter.
Reliability
is defined as the precision (consistency) of a test Reliability is expressed as a correlation coefficient
of the relationship between two sets of scores obtained from two or more equivalent tests taken by
the same individuals. Values lie between 0 and 1. High values (above 0.7) mean that the test is of
good quality – has low measurement error.
The most common methods of determining Reliability of a test as outlined as follows:
The test-retest method

the same test is given twice within a short period of time.
Equivalent forms
two tests that are equivalent (parallel) are administered to the same group of test-takers.
To circumvent use of two tests, one test can be administered, and the items are split into two parts,
and the scores obtained in the two halves are then correlated. This technique is known as the split
half method and involves splitting a test into two comparable parts by for example using odd and
even items.
The Coefficient or Cronbach Alpha (α)

measures the internal consistency of a test, that is, how individual items relate to each other. In
essence, it provides an estimate of the extent to which the individual items measure the same
construct.
Page 3 of 6
Inter-rater Reliability
Applied especially to essay test items where a measure of reliability is obtained from correlating
scores given by two expert judges who score the test independently.
In general, reliability will be higher for

(i) a long test than a short one
(ii) a test composed of homogenous items than one with heterogeneous items
(iii) a test with more discriminating items
(iv) a test with items of medium difficulty than either those with easy or difficult items.
(v) though it was mentioned longer tests are more reliable, this will hold only if the time
available is insufficient for all test-takers to complete the test.
Conclusion
reliability and validity are closely related. Reliability of a test is a requirement for its validity,
though not a guarantee. For example, a test in Introduction to Psychology may provide us with
reliable scores, which may regrettably lack in predictive validity with respect to Educational
Statistics and Evaluation.
Page 4 of 6
Exercise
1. Test-retest reliability refers to the

a) Average of all correlations among items on a measure.
b) Correlation between scores on two administrations of a measure.
c) High correlation between scores on two measures designed to assess the same construct.
d) Low correlation between scores on two measures designed to assess different constructs.
2. Which of the following characteristics applies to a measurement that consistently discriminates

between high and low scorers?
a) Validity b) Reliability
c) Precision d) Accuracy
3. If a trainee accountant takes the CPA section 3 exam four times and receives the same score all
four times, we may conclude that the exam is ______ but not necessarily _____.
a) Valid; Reliable b) Reliable; Accurate
c) Reliable; Valid d) Reliable; Valid
4.To obtain content related evidence of validity, you would examine the
a) Adequacy of sampling of domains
b) Accuracy of the construct being measured
c) Nature of the criterion
d) Size of the correlation coefficient
5. Test is reliable if the scores are

a) Consistent b) Objective
c) Relevant d) Usable
6. Muigai developed a test to measure patriotism but later found it only measured knowledge of the
national anthem. This test lacks in
a) Interpretability b) Reliability
Page 5 of 6
c) Usability d) Validity
7. Inadequate test planning is most likely to influence the test’s

a) Length b) Objectivity
c) Reliability d) Validity
8. Reliability is measured using

a) Correlational analysis b) the z-score
c) Standard deviation d) chi-square
9. Magati administered an aptitude test to some job applicants. He however found that some
candidates had copied results from others. He had to cancel their results the concerned candidates
because of test __________ concerns.
a) Validity b) Reliability
d) Relevance d) Specification
10. The method of estimating reliability for essay type tests is

a) Inter-rater b) Coefficient alpha
c) Kuder-Richardson d) Parallel forms
Page 6 of 6

Reliability and Validity of Achievement Tests

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reliability and Validity of Achievement Tests

Uploaded by

Copyright:

Available Formats

TEST VALIDITY AND RELIABILITY

Notes for Trainee Teachers

Department of Educational Psychology

The test-retest method

The Coefficient or Cronbach Alpha (α)

In general, reliability will be higher for

1. Test-retest reliability refers to the

2. Which of the following characteristics applies to a measurement that consistently discriminates

5. Test is reliable if the scores are

7. Inadequate test planning is most likely to influence the test’s

8. Reliability is measured using

10. The method of estimating reliability for essay type tests is

You might also like