You are on page 1of 16

Reliability

The reliability of a test refers to the consistency with which it yields the same rank for an individual taking the test more than once. A test is reliable if it consistently yields the same, or nearly the same ranks over repeated administrations during which we would not expect the trait being measured to have changed. Educational tests may be very reliable, fairly reliable or totally unreliable.

For instance, if a multiple-choice test given to a class were so difficult that everyone guessed at the answers, then a students rank would probably vary quite a bit from one administration to another and the test would be unreliable. In testing we would like that information to be stable,consistent, and dependable.

There are several ways to estimate the reliability of a test. The three basic methods most often used are called: Test-retest(or stability) Alternate form(or equivalence) Internal consistency

Test-Retest(Stability)
The test is given twice and the correlation between the first set of scores and the second set of scores is determined. A math test given to six students on Monday is given again on the following Monday without any math having been taught in between these times.

The six students make the following scores on the test: Student First test Second Test 1 75 78 2 50 62 3 93 91 4 80 77 5 67 66 6 88 88 Correlation Quite reliable

The main problem with test-retest reliability is that there is usually some memory or experience involved the second time the test is taken depending whether the interval between the two tests is short or long. Generally the longer the interval between test administration, the lower the correlation.

Alternate Forms or Equivalence


If there are two equivalent forms of a test, these forms can be used to obtain an estimate of the reliability of the test. Both forms are administered to a group of students, and the correlation between the two sets of scores is determined. This estimate eliminates the problems of memory and practice involved in test-retest estimates.

Large differences in students score on two forms of a test that supposedly measures the same behavior would indicate an unreliable test. To use this method of estimating reliability, two equivalent forms of the test must be available. The most critical problem with this method of estimating reliability is that it takes a great deal of effort to develop one good test, let alone two.

Internal Consistency
If the test in question is designed to measure a single basic concept, it is reasonable to assume that people who get one item right will be more likely to get other, similar items right. Items ought to be correlated with each other, and the test ought to be internally consistent.

One approach to determining a tests internal consistency, called split halves, involves splitting the test into two equivalent halves and determining the correlation between them.This can be done by assigning all items in the first half of the test to one form and all items in the second half or the test to the other form.

This approach is only appropriate when items of varying difficulty are randomly spread across the test. The best approach would be to divide test items by placing all odd-numbered items into one half and all even-numbered items into the other half. The reliability is more commonly called the oddeven reliability.

To find the split half)or odd-even) reliability, each item is assigned to one half or the other. Then, the total score fo each student on one half is determined and the correlation between the two total scores for both halves is computed.

A single test is used to make two shorter alternative forms. This method has the advantage that only one test administration is required and memory or practice effects are not involved. The split-half(or odd-even) reliability coefficient should be corrected or adjusted upward to reflect the reliability that the test would have if it were twice as long.

The formula used for this correction is called the Spearman-Brown formula. Another way of estimating the internal consistency of a test is through the Kuder-Richardson methods. The two most frequent method is by using KR20 and KR21

Interpreting Reliability Coefficients


Principle 1: Group variability affects the size of the reliability coefficient.Higher coefficients result from heterogeneous groups than from homogeneous groups. Principle 2: Scoring reliability limits test reliability.

Principle 3: All other factors being equal, the more items included ia a test, the higher the test reliability. Principle 4: Reliability tends to decrease as tests become too easy or too difficult.

You might also like