You are on page 1of 43

¡ The validity of a test is the extent to which

it measures what it claims to measure. It


defines the meaning of test scores (Gregory,
2011).

¡ The validity of a test measures what the


test measures, and how well it does so
(Anastasi, 1996).
¡ Validity can be defined as the agreement
between a test score or measure and the
quality it is believed to measure. It is
sometimes defined as the answer to the
question, “Does the test measure what it is
supposed to measure?” (Kaplan and
Saccuzzo, 2011)
¡ “A test is valid to the extent that
inferences made from it are appropriate,
meaningful, and useful.” (Standards for
Educational and Psychological Testing, 1999)
¡ Validity is always a matter of degree. Tests
may be useful or defensible for some
purposes and populations, but less for
others.
¡ The validity of tests is NOT easily captured
by neat statistical summaries, but is instead
characterized on a continuum ranging from
weak to acceptable to strong.
¡ All procedures for determining test validity
are concerned with the relationships
between:
§ Performance on the test, and
§ Other independently observable facts
about the behavior characteristics under
observation
¡The type of validity emphasized depends
on the purposes and consequences of
measurement.

1. Content Validity
2. Criterion-Related Validity
3. Construct Validity
¡ Content Validity is determined by the
degree to which the questions, tasks or
items on a test are representative of the
universe of behavior the test is designed to
sample. In theory, content validity is really
nothing more than a sampling issue.
§ Do the items adequately sample the
content domain of the construct or
constructs that the test purports to
measure?
§ Systematically examine test content to
determine whether it covers a
representative sample of the behavior
domain to be measured.
§ Content validity is usually established by a
careful examination of items by a panel of
experts in a given field. In effect, the test
developer asserts that “a panel of experts
reviewed the domain specification carefully
and judged the following test questions to
possess content validity.”
---------------------------------------------------------------------------------------------------------------------

Reviewer: __________________________________ Date: _________________

Please read carefully through the domain specification for this test. Next, please indicate
how well you feel each item reflects the domain specification. Judge a test item solely
on the basis of match between its content and the content defined by the domain
specification. Please use the four-point rating scale shown below:

1 (not relevant) 2 (somewhat relevant)


3 (quite relevant) 4 (very relevant)

(Sample Judges Item-Rating Form for Determining Content Validity)


---------------------------------------------------------------------------------------------------------------------
§ The behavior domain to be tested must be
systematically analyzed to make certain
that all major aspects are covered by the
test items, and in the correct proportions.
§ It is also important to guard against any
tendency to overgeneralize regarding the
domain sampled by the test.
o For example, a multiple choice spelling test
may measure the ability to recognize correctly
and incorrectly spelled words. But it cannot be
assumed that such a test also measures ability
to spell correctly from dictation, frequency of
misspellings in written compositions, etc.
¡ **Specific Procedures in doing Content
Validity
¡ § Choice of appropriate items
¡ § Drawing of test specifications
o It should show the content areas or topics to be
covered, the instructional processes or objectives to
be tested, and the relative importance of individual
topics and processes
o It should indicate the number of items of each kind to
be prepared for each topic.
§ Content validation is particularly
appropriate for domain-referenced tests.
Because performance on those tests is
interpreted in terms of content meaning, it
is obvious that content validation is a
prime requirement for their use.
§ Content validation is also applicable to
certain occupational tests designed for
employee selection and classification.
o Content validation is suitable when the test is an
actual job sample or otherwise calls for the same skills
and knowledge required on the job.
o In such cases, a thorough job analysis should be
carried out in order to demonstrate the close
resemblance between the job activities and the test.
¡ Criterion-Related Validity inquires into the
relationship (correlation) between scores on
a test or inventory, and other, external
criteria to which the test/inventory is
theoretically related.
¡ What is a criterion?

¡ A criterion is (1) the standard against


which the test is compared; (2) a direct and
independent measure of what the test is
designed to predict.
¡ Examples of criteria:
¡ (1) A test might be used to predict which engaged
couples will have successful marriages and which ones
will get divorced. Marital success is the criterion, but it
cannot be known at the time the couples take the
premarital test.
§ The reason for gathering criterion validity evidence is
that the test or measure is to serve as a “stand in” for
the measure we are really interested in.
¡ Examples of Criteria
¡ (2) A college entrance exam that is
reasonably accurate in predicting the
subsequent grade point average of
examinees would possess criterion-related
validity.
¡ Other Examples
Test Type Criterion
Driver Skill Test Number of traffic citations received in the last 10
months
Social Readjustment Number of days spent in a psychiatric hospital in
Scale the last three years

Sales Potential Test Peso amount of goods sold in the preceding year
Characteristics of a Good Criterion
1. Reliable
¡ It is a useful index of what the test measures
§ Example: The validity of the USTET can be studied
by computing the correlation (r) between
entrance exam scores and grade point averages
for a representative sample of students. In any
case, the resulting correlation coefficient is called
a validity coefficient
Characteristics of a Good Criterion
2. Appropriate for the test under investigation
¡ All criterion measures should be described
accurately, and the rationale for choosing them
as relevant criteria should be made explicit.
§ Example: In the case of “interest tests,” it is
sometimes unclear whether the criterion measure
should indicate satisfaction, success, or continuance
in the activities under question. The choice between
these subtle variants in the criterion must be made
carefully, based on an analysis of what the interest
test purports to measure.
Characteristics of a Good Criterion
3. Free of contamination from the test itself.
§ Criterion contamination is the term applied to a
criterion measure that has been based, at least in
part, on predictor measures.
¡ Example
Name of Test Criterion Validation
“Inmate Violence Ratings from fellow Asking guards to rate
Potential Test” –
predicts a prisoner’s inmates, guards, and each inmate on their
potential for violence in other staff in order violence potential
the cell block to come up with a
number that
represents each
inmate’s violence
potential.
§ Concurrent Validity – relationship between
test scores and an external criterion that is
measured at approximately the same time.
o Example 1: A test for determining skills in
logical reasoning is administered to a group of
students. Scores on this test are compared
with scores on another test on logical
reasoning of already known validity. If r is
high, then the test has concurrent validity.
o Example 2: An arithmetic achievement test
would possess concurrent validity if its scores
could be used to predict, with reasonable
accuracy, the current standing of students in a
mathematics course.
o Example 3: A personality inventory would
possess concurrent validity if diagnostic
classifications derived from it roughly matched
the opinions of psychiatrists or clinical
psychologists.
§ Predictive Validity – relationship between
test scores and an external criterion that is
measured somewhat later.
o Example 1: When scores on a math aptitude
test correlate highly with the final grades of
students in math, the aptitude test is said to
have high predictive validity.
o Example 2: An employment test can be
validated against supervisor ratings after six
months on the job.
¡ Construct Validity is a judgment about the
appropriateness of inferences drawn from
test scores regarding individual standings on
a variable called a construct (Cohen and
Swerdlik, 2009). It is the extent to which the
test may be said to measure a theoretical
construct or trait (Anastasi, 1996).
¡ What is a construct?
§ A construct is an unobservable trait that is known
to exist. It is a theoretical, intangible quality in
which individuals differ (Gregory, 2011).
¡ Examples of constructs:

IQ Anxiety
Leadership Ability Hostility
Motivation Neuroticism
Self Esteem Scholastic Aptitude
Depression
¡How can we be sure a test measures these, if
we can’t directly measure them?
§ Each construct is developed to explain and
organize observed response consistencies. It
derives from established interrelationships among
behavioral outcomes.
§ A test designed to measure a construct must
estimate the existence of an inferred, underlying
characteristic (e.g., leadership ability) based on a
limited sample of behavior. How appropriate are
these inferences about the underlying construct –
that is construct validity.
¡ All psychological constructs possess two
characteristics in common:
§ 1. There is no single external referent sufficient to
validate the existence of the construct; that is, the
construct cannot be operationally defined.
§ 2. Nonetheless, a network of interlocking
suppositions can be derived from existing theory
about the construct.
¡ Example: PSYCHOPATHY
Description Characteristic # 1 Characteristic # 2
A personality No single behavioral A network of
constellation characteristic or interlocking
characterized by outcome sufficient suppositions can be
antisocial behavior to determine who is derived from
(lying, stealing, strongly existing theory
occasional psychopathic and about psychopathy.
violence), lack of who is not.
guilt or shame, and
impulsivity.
¡ Characteristic # 1: On average, we
might expect psychopaths to be frequently
imprisoned, but so are many common
criminals. Furthermore, many successful
psychopaths somehow avoid apprehension
altogether. Psychopathy cannot be gauged
only by scrapes with the law.
¡ Characteristic # 2: The fundamental problem in
psychopathy is presumed to be a deficiency in the
ability to feel emotional arousal – whether empathy,
guilt, fear of punishment, or anxiety under stress. A
number of predictions follow from this appraisal. For
example, psychopaths should lie convincingly, have a
greater tolerance for physical pain, and get into
trouble because of their lack of behavioral inhibition.
Thus to validate a measure of psychopathy, we
would need to check out a number of different
expectations based on our theory of psychopathy.
¡ Construct validation requires the gradual
accumulation of information from a variety
of sources.
¡ The crucial point to understand about
construct validity is that “no criterion or
universe of content is accepted as entirely
adequate to define the quality to be
measured
¡ Approaches to Construct Validity

¡ 1. Developmental Changes
§ Age Differentiation – a major criterion
employed in validating traditional IQ tests.
§ Since abilities increase with age, it is logical
that test scores will also improve with age.
§ Stanford-Binet Test – checked against
chronological age to determine whether
scores show a progressive increase with
advancing age
§ Theory: Intelligence increases with age.
§ This is just one measure of construct validity,
but is NOT conclusive. In other words,
determining construct validity by
developmental changes alone is NOT sufficient.
¡ 2. Correlations with Other Tests
§ Correlations between a new test and similar earlier
tests are sometimes cited as evidence that the new
test measures approximately the same general area
of behavior as the other test.
¡ Correlations should be moderately high, but not too
high. If the test correlates too highly with an already
available test, without such added advantages such
as brevity or ease of administration, then the new
test represents needless duplication.
3. Factor Analysis
§ A refined statistical technique for analysing
interrelationships of behavior data.
§ Factor analysis groups multiple factors into
a few factors.
4. Internal Consistency
§ The criterion is none other than the total
score on the test itself.
5. Convergent and Divergent Validation
§ The test should correlate highly with other
similar tests, and correlate low with
dissimilar tests.

You might also like