Validity and Test Validation

VALIDITY AND TEST VALIDATION
Prepared by Olga Simonova, Inna Chmykh,

Svetlana Borisova, Olga Kuznetsova
Based on materials by Anthony Green
Validity
ABC Test of English

Results
Ivana 45%
Irina 78%
Which student is better at

English?
Validity
Some aspects may

T
not be tested:
Construct under-
representation Assessment tasks
Validity
Some abilities that are
important to success in a
test may not be Language Ability
connected to real-world
language abilities:
•ability to cope with exam
stress;
•awareness of how
multiple-choice questions
are written;
•willingness to guess etc.
These are construct
irrelevant factors.
What is validity?
Tests are tools for helping us to make good decisions.

Construct relevance:
• a test of maths (even if it’s very reliable) can’t
tell us about someone’s ability to sing;
• a test of written grammar can’t tell us much
about someone’s ability to hold a conversation.
Construct representation:
• does the test cover all aspects of the relevant
abilities?
What is validity?
‘validity refers to the degree to which evidence and

theory support the interpretations of test scores
entailed by proposed uses of tests’
American Educational Research Association et al. (1999)
This means that test results can be valid for one
purpose and for one particular population of test
takers, but not for others.
A test may be valid for placement purposes on a
general language course, but not for employment
selection.
Building a validation argument
What do we want the results to mean? What evidence can

we collect to find out if scores really support this
interpretation?
•evaluation – the test taker’s performance is a fair reflection of
his/her abilities;
•generalization – similar scores would be obtained if the test
taker was given a different form of the test, or if the raters
scoring his/her performance were different;
•explanation – the test reflects a coherent theory of language
ability;
•utilisation – the tested abilities are relevant to the decision
being made about the test taker.
Validation in the assessment cycle:
• at different stages in the cycle, different

questions need to be answered;
• different types of validity may be more
relevant at each stage;
• tests made for different purposes raise
different issues.
Building a validation argument:
• Evaluation – the test taker’s performance is a fair reflection of

his/her abilities. Test form and administration.
• Generalization – similar scores would be obtained if the raters
scoring his/her performance were different. Test score and
rating scales.
• Explanation – the test reflects a coherent theory of language
ability. Specification.
• Utilisation –the tested abilities are relevant to the decision
being made about the test taker. Test purpose and target
language use domain.
VALIDITY AND TEST VALIDATION
Validity in test design
“Tests for the measurement of language

abilities must be constructed according to a
coherent validity framework based on the
latest developments in theory and practice.”
(Weir, 2005)
Socio-cognitive approach
(O’Sullivan & Weir, 2010)
CONTEXT COGNITIVE
VALIDITY VALIDITY
TEST TASK
PERFORMANCE
SCORING VALIDITY
CONSEQUENTIAL CRITERION-RELATED
VALIDITY VALIDITY
Content (context) validity
Content validity is based on subject experts'

judgments of test content.
Does the content of the test adequately
cover all the aspects of language ability we
are interested in for making this decision?
Content (context) validity
A test is said to have content validity if its content

constitutes a representative sample of the language skills,
structures, etc. with which it is meant to be concerned.
(Hughes, 2005)
The term content validity was traditionally used to refer to
the content coverage of the task. Context validity is
preferred as a more inclusive superordinate which signals
the need to consider the discoursal, social and cultural
context as the linguistic parameters under which the task is
performed (its operations and conditions).
(Weir and Shaw, 2005)
Cognitive (or theory-based) validity
Do test takers go through the same mental

processes when responding to test tasks as
when they use language in the real world in
the situations we are interested in?
Cognitive (or theory-based) validity
Theory-based validity involves collecting a

priori evidence through piloting and trialling
before the test event, for example through
verbal reports from test takers on the cognitive
processing activated by the test task, and a
posteriori evidence involving statistical analysis
of scores following test administration.
Scoring validity
Scoring validity accounts for the extent to which test

scores are:
•based on appropriate criteria;
•exhibit consensual agreement in their marking;
•free as possible from measurement error;
•stable over time;
•engender confidence as reliable decision making
indicators.
Scoring validity
Scoring validity = reliability
Are the test scores consistent enough for us to

have confidence in the results?
Criterion-related validity
Criterion-related validity relates to the degree to which

results on the test agree with those provided by some
independent and highly dependable assessment of the
candidate's ability. This independent assessment is thus the
criterion measure against which the test is validated.
(Hughes, 2003)
Are test results of the test consistent with other evidence
we have about test takers’ abilities?
Criterion-related validity takes two forms:
concurrent validity predictive validity
Concurrent validity
“involves the comparison of the test scores

with some other measures of the same
candidates taken at roughly the same time
as the test.”
(Alderson et al., 1995:177)
Do scores on our test agree with the results

of other tests of the same abilities?
Predictive validity
Predictive validity entails the comparison of test scores

with some other measure for the same candidates
taken some time after the test has been given.
(Alderson et al., 1995)
The degree to which a test can predict candidates'
future performance.
(Hughes, 2003)
Did the test accurately predict which test takers were
going to perform best in their jobs/ in class/ etc.?
Consequential validity (impact)
Does the introduction and use of the test have

the intended social consequences?
Is there any:
•bias in scoring and interpretation of results?
•unfairness in test use?
•positive or negative effect on teaching and
learning?
Face validity
Face validity refers to the test's “surface

credibility or public acceptability”
(Alderson, et al., 1995:172).
Bachman (1990:307) states that “face
validity is the appearance of real life.”
Do test takers/ teachers/ politicians/ the
public generally believe in the value of the
test?
Face validity
The assessment is credible to users: it looks as though it

measures the skills or abilities of interest.
For example, a multiple choice grammar test does not look as
though it really tests the ability to speak English in real-world
situations. All kinds of evidence could be used to show that
people who pass the test are actually able to communicate
effectively, but users may not be convinced because test takers
are not actually required to speak. If the test does not have
face validity, it is unlikely to be successful.
Construct validity
In recent years the term construct validity has been

used to refer to the general, overarching notion of
validity.
It is not enough to assert that a test has construct
validity; empirical evidence is needed.
(Hughes, 2003)
The arguments for using the test as a reasonable
justification for taking any decision must be presented
and examined: validation.
Round-up:
suitable data for test validity
Face validity Questionnaires to and interviews with candidates,
administrations and other users.
Context validity a) Compare test content with specifications/syllabus.
b) Questionnaires to and interviews with 'experts' such as
teachers, subject specialists, applied linguists.
c) Expert judges rate test items and texts according to
precise list of criteria.
Cognitive validity Students introspect on their test-taking procedures, either
concurrently or retrospectively. Keystroke logs. Eye-
tracking.
Concurrent validity a) Compare students' test scores with their scores on
another test.
b) Compare students' test scores with teachers' rankings.
c) Compare students' test scores with other measures of
ability such as students' teacher rating.
Suitable data for test validity
Predictive validity a) Compare students' test scores with their scores on tests
taken some time later.
b) Compare students' test scores with success in final exam.
c) Compare students' test scores with other measures of their
ability taken some time later, such as employers'
assessments of their ability.
Construct validity a) Compare performance on each subtest with other subtests.

b) Compare performance on each subtest with total of all other
subtests.
d) Compare students' test score with students' biodata and
psychological characteristics.
e) Multitrait-multimethod studies.
f) Factor analysis.
Who is a validator?
Roles Example validity questions
• Designers • Does the design of the test reflect an adequate
theory of language?
• Producers • Is an appropriate balance of abilities required for
success on the test?
• Organisers • Do the test items reflect the designers’ intentions?
• Is the test organised and administered in a way that
• Administrators will ensure fairness?
• Assessees • Do assessees respond to the test tasks in a way that
reflects realistic language processing?
• Scorers • Do scorers consistently and accurately capture the
qualities of test takers’ performance?
• Users • Are decisions taken by users justified by the test?
Who is a validator?
Assessment developers (teachers, testing agencies):

• to check the quality of their own work;
• to showcase the quality of their tests.
Assessment users:
• to check that tests are giving them accurate and
relevant information.
Independent agencies:
• to enforce/ encourage good quality assessment.
Conclusion
• Test validity, according to Alderson et al.,

(1995:193), is 'time-consuming and difficult'.
• However, it is essential as a test without
validity cannot be useful as a decision making
tool.
• Applied linguists and teachers should focus
more of their efforts on practical research in
this field.

Validity and Test Validation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Validity and Test Validation

Uploaded by

Copyright:

Available Formats

VALIDITY AND TEST VALIDATION

Prepared by Olga Simonova, Inna Chmykh,

ABC Test of English

Which student is better at

Some aspects may

Tests are tools for helping us to make good decisions.

‘validity refers to the degree to which evidence and

What do we want the results to mean? What evidence can

• at different stages in the cycle, different

• Evaluation – the test taker’s performance is a fair reflection of

“Tests for the measurement of language

Content validity is based on subject experts'

A test is said to have content validity if its content

Do test takers go through the same mental

Theory-based validity involves collecting a

Scoring validity accounts for the extent to which test

Scoring validity = reliability

Are the test scores consistent enough for us to

Criterion-related validity relates to the degree to which

“involves the comparison of the test scores

Do scores on our test agree with the results

Predictive validity entails the comparison of test scores

Does the introduction and use of the test have

Face validity refers to the test's “surface

The assessment is credible to users: it looks as though it

In recent years the term construct validity has been

Construct validity a) Compare performance on each subtest with other subtests.

Assessment developers (teachers, testing agencies):

• Test validity, according to Alderson et al.,

You might also like