Professional Documents
Culture Documents
To be a good test, a test ought to have adequate evidence for its validity, reliability and accuracy. Validity: Does the test measure what it is supposed to measure? Reliability: Does the test yield the same or similar scores(all other factors being equal) consistently? Accuracy: Does the test fairly closely approximate an individual,s true level of ability, skill, or aptitude?
Validity
A test has validity evidence if we can demonstrate that it measures what it says it measures. Eg: If it is supposed to be a measure of ability to write behavioral objectives, it should measure that ability, not the ability to recognize bad objectives.
Content Validity
There are several ways of deciding whether a test has sufficient validity evidence. The simplest is content validity evidence. The content validity evidence for a test is established by examination. Test questions are inspected to see whether they correspond to what the user decides should be covered by the test.
Content validity evidence answers the questionDoes the test measure the instructional objectives? A test with good content validity evidence matches or fits the instructional objectives. It is easiest when the test is in an area such as achievement, where it easy to specify what should be included in the content of a test.
It is more difficult if the concept being tested is a personality or aptitude trait. It is difficult to specify beforehand what a relevant question should look like. A test can sometimes look valid but measure something entirely different than what is intended, such as guessing ability,reading level, or skills that have been acquired before instruction.
Criterion-Related Validity
There are two types of criterion-related validity evidence: Concurrent Predictive
Concurrent Validity
Concurrent validity evidence deals with measures that can be administered at the same time as the measure to be validated. Example: Stamford-Binet and the Wechsler Intelligence Scale for Children-III (WISC-III) are IQ tests. A test publisher designing a short-screening test that measure IQ might show that the test is highly correlated with the Binet or WISC-III and thus established concurrent criterion-related validity.
Binet 88 86 77 72 65 62 59 58
Test B 37 34 32 26 22 21 19 16
The SAT, for instance, is frequently used to help decide who should be admitted to college. The predictive validity evidence of a test is determined by administering the test to a group of subjects, then measuring the subjects on whatever the test is supposed to predict after a period of time has elapsed.
The two test scores are then correlated, and the coefficient that results is called a predictive validity coefficient. Both predictive and concurrent criterion-related validity evidence yield numerical indices of validity. Content validity evidence does not yield a numerical index, but instead it yields a logical judgment as to whether the test covers what it is supposed to cover.
Construct Validity
A test has construct validity evidence if its relationship to other information corresponds well with some theory. If a test is supposed to be a test of arithmetic computation skills, you would expect scores on it to improve after intensive coaching in arithmetic. If it is a test of mechanical aptitude, you might expect the mechanics would, on the average do better on it than poets.
A test should ideally do the job its written to do. It should measure what its supposed to measure. The following questions are equivalent: Is the test valid for the intended purpose? Does the test measure what it is supposed to measure? Does the test do the job it was designed to do?
Principle 2: Group variability affects the size of the validity coefficient. Higher validity coefficients are derived from heterogeneous groups than from homogeneous groups. Principle 3: The relevance and reliability of the criterion should be considered in the interpretation of validity coefficients.