You are on page 1of 2

The usual procedure for determining content validity may be described as follows: The teacher writes

out the objectives of the test based on the table of specifications and then gives these together with the
test to at least two (2) experts along with a description of the intended test takers. The experts look at
the objectives, read over the items in the test and place a check mark in front of each question or item
that they feel does not measure one or more objectives. They also place a check mark in front of each
objective not assessed by any item in the test. The teacher then rewrites any item so checked and
resubmits to the experts and/or writes new items to cover those objectives not heretofore covered by
the existing test. This continues until the experts approve of all items and also until the experts agree
that all of the objectives are sufficiently covered by the test.

In order to obtain evidence of criterion-related validity, the teacher usually compares scores on the test
in question with the scores on some other independent criterion test which presumably has already high
validity. For example, if a test is designed to measure mathematics ability of students and it correlates
highly with a standardized mathematics achievement test (external criterion), then we say we have high
criterion-related evidence of validity. In particular, this type of criterion-related validity is called its
concurrent validity. Another type of criterion-related validity is called predictive validity wherein the test
scores in the instrument are correlated with scores on a later performance (criterion measure) of the
students. For example, the mathematics ability test constructed by the teacher may be correlated with
their later performance in a Division wide mathematics achievement test.

Apart from the use of correlation coefficient in measuring criterion-related validity, Gronlund suggested
using the so-called expectancy table. This table is easy to construct and consists of the test (predictor)
categories listed on the left hand side and the criterion categories listed horizontally along the top of the
chart. For example, suppose that a mathematics achievement test is constructed and the scores are
categorized as high, average, and low. The criterion measure used is the final average grades of the
students in high school: Very Good, Good, and Needs Improvement. The two way table lists down the
number

ASSESSMENT OF LEARNING OUTCOMES (ASSESSMENT 1)

students falling under each of the possible pairs of (test, grade) as shown below:

The expectancy table shows that there were 20 students getting high test scores and subsequently rated
excellent in terms of their final grades; 25 students got average scores and subsequently rated good in
their finals; and finally, 14 students obtained low test scores and were later graded as needing
improvement. The evidence for this particular test tends to indicate that students getting high scores on
it would be graded excellent; average scores on it would be rated good later; and students getting low
scores on the test would be graded as needing improvement later.
We will not be able to discuss the measurement of construct- related validity in this book since the
method to be used require sophisticated statistical techniques falling in the category of factor analysis.

6.2. Validation

characteristics After performing the item analysis and revising the items reliability which need revision,
the next step is to validate the instrument, is to determine the of the whole test itself, namely, the
validity and the test. Validation is the process of collecting and analyzing the teste o support the
meaningfulness and usefulness of the test.

Validity. Validity is the extent to which a test measures what it purports to measure or as referring to
the appropriateness, correctness, meaningfulness and usefulness of the specific decisions a teacher
makes based on the test results. These two definitions of validity differ in the sense that the first
definition refers to the test itself while the second refers to the decisions made by the teacher based on
the test. A test is valid when it is aligned to the learning outcome.

A teacher who conducts test validation might want to gather different kinds of evidence. There are
essentially three main types of evidence that may be collected: content-related evidence of validity,
criterion-related evidence of validity and construct-related evidence of validity. Content-related
evidence of validity refers to the content and format of the instrument. How appropriate is the content?
How comprehensive? Does it logically get at the intended variable? How adequately does the sample of
items or questions represent the content to be assessed?

Criterion-related evidence of validity refers to the relationship between scores obtained using the
instrument and scores obtained using one or more other tests (often called criterion). How strong is this
relationship? How well do such scores estimate present or predict future performance of a certain type?

Construct-related evidence of validity refers to the nature of the psychological construct or


characteristic being measured by the test. How well does a measure of the construct explain differences.
in the behavior of the individuals or their performance on a certain task?

You might also like