You are on page 1of 2

NSGC

Validity

● A test is said to be valid if it accurately measures what it intends to measure.

Content validity

● A test is said to have content validity if its content constitutes a representative


sample of the language skills and structures to which it is meant to pertain.
● To judge whether a test has content validity, it is needed a purpose and a
specification of the skills and structures that it is meant to cover.
● A comparison of test specification and test content is the basis for judgements as
to content validity.

The importance of content validity resides in:


● The greater a test’s content validity, the more likely it is to be an accurate
measure of what it is supposed to measure.
● Such a test is likely to have a harmful backwash effect, focusing on what is
important to test, rather than what is easy.
● The safeguard is to write full test specifications and ensure that the test content is
a fair reflection of these.

Criterion-related validity

● Another approach to test validity is to see how far results on the test agree with
those provided by some independent and highly dependable assessment of the
candidate’s ability. This independent assessment is thus the criterion measure
against which the test is validated.

Two kinds of criterion-related validity:


● Concurrent validity is established when the test and the criterion are administered
at the same time. Used for achievement or certification exams.
● Accurate measures such as the level of agreements are established through
standard procedures for comparing sets of scores, which generate what is called
the “validity coefficient” (a mathematical measure of similarity).
● Perfect agreement between two sets of scores will result in a validity coefficient
of 1. Validity coefficient results: minimum acceptable 0 to 0.6, 0.6 to 0.8
acceptable for evaluation in the classroom and higher than 0.8 is acceptable for
certification.
● Predictive validity concerns the degree to which a test can predict candidates’
future performance. Used for example to predict a student’s ability to cope with
graduates from a university through a proficiency test.
● So many factors such as knowledge, intelligence, and motivation can contribute
to students’ outcomes, so there are interesting issues about using the final
outcome as the criterion measure, where 0.4 (20% agreement) is about as high as
a validity coefficient can go. So the test’s accuracy in predicting problems goes
unrecognized.
NSGC

● Predictive validity can also be an attempt to validate a placement test, predicting


the most appropriate class for any student. It would be needed a comparison of
the number of misplacements (their effect on teaching and learning) with the cost
of developing and administering a test to place students more accurately.

Construct validity

● A test, part of a test or a testing technique is said to have construct validity if it


can be demonstrated that it measures just the ability that it is supposed to
measure.
● Common-sense constructs like reading and writing are unproblematical, we can
be confident that we are measuring a distinct and meaningful ability.
● Once we try to measure such an ability indirectly, however, we can no longer
take for granted what we are doing. We need to look at the theory for guidance
as to the form an indirect test should take, its content and its techniques. We
construct items that are meant to measure sub-abilities and administer them as a
pilot test.
● One step we would almost certainly take is to obtain extensive samples of the
writing ability of the group to whom the test is first administered, and have these
reliably scored. We would then compare scores on the pilot test with the
agreement (and a coefficient of the kind described in the previous section can be
calculated). Then we have evidence that we are measuring the planned ability
with the test.
● To develop a satisfactory indirect test, we might administer a series of specifically
constructed tests, measuring each of the constructs by several methods.
Coefficients could be calculated between various measures. If the coefficients
between scores on the same construct are consistently higher than those
between scores on different constructs, then we have evidence that we are
indeed measuring separate and identifiable constructs.

Face validity
● A test is said to have face validity if it looks as if it measures what it is supposed to
measure.
● A test with no face validity may not be accepted by candidates, teachers,
education authorities or employers, It may simply not be used; and if it is used,
the candidates’ reaction to it may mean that they do not perform on it in a way
that truly reflected their ability.

The use of validity


● Every effort should be made in constructing tests to ensure content validity. Tests
should be validated empirically against some criterion. When it is intended to use
indirect testing, reference should be made to the research literature to confirm
that measurement of the relevant underlying constructs has been demonstrated
using the testing techniques that are to be used. Any published test should
supply details of its validation

You might also like