A debater grasps the arguments pro and con se well the he or she could speak for

either side. Or Shifting the metaphor to legal counselors, so well that they co
uld tell either party what is strong and weak in its position.
I propose here to extend to all testing the lessons from program evaluation. Wha
t House (1977) has called "the logic of evaluation argument" applies, and I invi
te you to think of "validity argument" rather than "validation research"
The argument must link concepts, evidence, social and personal consequences and
Most validty theorists have been saying that content and criterion validities ar
e no more than strands within a cable of validity argument. A favorable answer t
o one or two questions can fully support a test only when no one cares enough to
raise further questions.
A hypothesis that fails is more likely to be amended than abandoned.
"What work is requirred to validate a test interpretation?". Nor can supporting
research in any amount immunize a theory against a future challenge based on new
and credible assumptions. As psychological science generates new concepts, test
interpretations will have to be reconsidered. Also because psychological and ed
ucational test influence who gets what in society, fresh challenges follow shift
s in social power or social philosophy. VALIDATION IS NEVER FINISHED.
An affirmative argument should make clear and, to the extent possible, persuasiv
e the construction of reality and the value weightings implicit in a test and it
s application. To be plausible, an argument pro or con must fit with prevailing
beliefs and values- or succesfully overturn them.
Questions about test originate in five perspectives: the functional, the politic
al, the operationist, the economic, and the explanatory.
Functional perspective:
The literature on validation has concentrated on the truthfulness of test interp
retations, but the functionalist is more concerned with worth than truth. In the
very earliest discussions of test validity, some writers said that a test is va
lid if it measures "what it purports to measure". That raised, in a primitive fo
rm, a question about truth. Other ealy writers, saying that a test is avlid if i
t serves the purpose for which it is used, raised a question about worth. Truthf
ulness is an element in worth, but the two are not tightly linked.
"Similarly, built in conservatism was what aroused latter-day objections to Stro
ng's blank for women, with its scores for occupations in which women were numero
us, the profile seemed to respond directly to typical questions fo female counse
lees. By hinting however, that the list of scales spanned the range of women's v
ocational options, the profile reinforced sex stereotypes.
Test that impinge on the rights and life chances of individuals are inherently d
isputable. We have come a long way from the naive testimony given Congress two d
ecades ago to the effect that, if sex life or religion correlates with a criteri
on, psychologist who ask prospective employees about that are only doing their d
uty. Representativenedd of the jury weighed far heavier on the scales of justice
than superior comprenhension, said the judge.
Validators have an obligation to review whether a practice has appropiate conseq
uences for individuals and institutions, and especially to guard against adverse
Non professionals will do the evaluating of practices unaided, if professionals
do not communicate sensibly to them. Whether institutions are treating examinees
fairly will be decided by the political legal process, but the profession ought
to improve the basis for that decision.
Acceptance or rejection of a practice or theory comes about because a community
is persuaded even research specialist do not judge a conclusion as it stands alo
ne: they judge its compatibility with a network of prevailing beliefs.
Validity argument contributes when it develops facts and when it highlights unce
rtainties of fact or implication. A community should be disputatios. Then judgem
ents are more likely to be as sound as present information allows.
The investigator should canvass all types of stakeholders for candidate question
s. Evaluators should resist pressure to concentrate on what persons in power hav
e specified as their chief question. The obvious example of too narrow an inquir
y is the validation of employment tests, which used to concentrate on predicting
a criterion the employer cared about and neglected to collect facts on what kin
ds of applicants were most likely to be rejected by the test.
To win acceptance for test professionals view as fair, effective communication i
s vital. Response to unfamiliar material must be examined to assess thinking.
Until educators and testers convince students and the public that, in those very
subjects where excellence is most wanted, coping with the problematic is a main
objetive, valid tests will be howled down as unfair. Matching tests to the curr
iculum produces a spurious validty argument, wherever the curricular aim is no h
igher than to have students reproduce authority's responses to a catechism.