You are on page 1of 30

VALIDITY AND TEST VALIDATION

Prepared by Olga Simonova, Inna Chmykh,


Svetlana Borisova, Olga Kuznetsova
Based on materials by Anthony Green
Validity

ABC Test of English


Results
Ivana 45%
Irina 78%

Which student is better at


English?
Validity

Some aspects may


T
not be tested:
Construct under-
representation Assessment tasks
Validity
Some abilities that are
important to success in a
test may not be Language Ability
connected to real-world
language abilities:
•ability to cope with exam
stress;
•awareness of how
multiple-choice questions
are written;
•willingness to guess etc.
These are construct
irrelevant factors.
What is validity?

Tests are tools for helping us to make good decisions.


Construct relevance:
• a test of maths (even if it’s very reliable) can’t
tell us about someone’s ability to sing;
• a test of written grammar can’t tell us much
about someone’s ability to hold a conversation.
Construct representation:
• does the test cover all aspects of the relevant
abilities?
What is validity?

‘validity refers to the degree to which evidence and


theory support the interpretations of test scores
entailed by proposed uses of tests’
American Educational Research Association et al. (1999)
This means that test results can be valid for one
purpose and for one particular population of test
takers, but not for others.
A test may be valid for placement purposes on a
general language course, but not for employment
selection.
Building a validation argument

What do we want the results to mean? What evidence can


we collect to find out if scores really support this
interpretation?
•evaluation – the test taker’s performance is a fair reflection of
his/her abilities;
•generalization – similar scores would be obtained if the test
taker was given a different form of the test, or if the raters
scoring his/her performance were different;
•explanation – the test reflects a coherent theory of language
ability;
•utilisation – the tested abilities are relevant to the decision
being made about the test taker.
Validation in the assessment cycle:

• at different stages in the cycle, different


questions need to be answered;
• different types of validity may be more
relevant at each stage;
• tests made for different purposes raise
different issues.
Building a validation argument:

• Evaluation – the test taker’s performance is a fair reflection of


his/her abilities. Test form and administration.
• Generalization – similar scores would be obtained if the raters
scoring his/her performance were different. Test score and
rating scales.
• Explanation – the test reflects a coherent theory of language
ability. Specification.
• Utilisation –the tested abilities are relevant to the decision
being made about the test taker. Test purpose and target
language use domain.
VALIDITY AND TEST VALIDATION
Validity in test design

“Tests for the measurement of language


abilities must be constructed according to a
coherent validity framework based on the
latest developments in theory and practice.”
(Weir, 2005)
Socio-cognitive approach
(O’Sullivan & Weir, 2010)

CONTEXT COGNITIVE
VALIDITY VALIDITY

TEST TASK

PERFORMANCE

SCORING VALIDITY

CONSEQUENTIAL CRITERION-RELATED
VALIDITY VALIDITY
Content (context) validity

Content validity is based on subject experts'


judgments of test content.
Does the content of the test adequately
cover all the aspects of language ability we
are interested in for making this decision?
Content (context) validity

A test is said to have content validity if its content


constitutes a representative sample of the language skills,
structures, etc. with which it is meant to be concerned.
(Hughes, 2005)
The term content validity was traditionally used to refer to
the content coverage of the task. Context validity is
preferred as a more inclusive superordinate which signals
the need to consider the discoursal, social and cultural
context as the linguistic parameters under which the task is
performed (its operations and conditions).
(Weir and Shaw, 2005)
Cognitive (or theory-based) validity

Do test takers go through the same mental


processes when responding to test tasks as
when they use language in the real world in
the situations we are interested in?
Cognitive (or theory-based) validity

Theory-based validity involves collecting a


priori evidence through piloting and trialling
before the test event, for example through
verbal reports from test takers on the cognitive
processing activated by the test task, and a
posteriori evidence involving statistical analysis
of scores following test administration.
(Weir and Shaw, 2005)
Scoring validity

Scoring validity accounts for the extent to which test


scores are:
•based on appropriate criteria;
•exhibit consensual agreement in their marking;
•free as possible from measurement error;
•stable over time;
•engender confidence as reliable decision making
indicators.
(Weir and Shaw, 2005)
Scoring validity

Scoring validity = reliability

Are the test scores consistent enough for us to


have confidence in the results?
Criterion-related validity

Criterion-related validity relates to the degree to which


results on the test agree with those provided by some
independent and highly dependable assessment of the
candidate's ability. This independent assessment is thus the
criterion measure against which the test is validated.
(Hughes, 2003)
Are test results of the test consistent with other evidence
we have about test takers’ abilities?
Criterion-related validity takes two forms:
concurrent validity predictive validity
Concurrent validity

“involves the comparison of the test scores


with some other measures of the same
candidates taken at roughly the same time
as the test.”
(Alderson et al., 1995:177)

Do scores on our test agree with the results


of other tests of the same abilities?
Predictive validity

Predictive validity entails the comparison of test scores


with some other measure for the same candidates
taken some time after the test has been given.
(Alderson et al., 1995)
The degree to which a test can predict candidates'
future performance.
(Hughes, 2003)
Did the test accurately predict which test takers were
going to perform best in their jobs/ in class/ etc.?
Consequential validity (impact)

Does the introduction and use of the test have


the intended social consequences?

Is there any:
•bias in scoring and interpretation of results?
•unfairness in test use?
•positive or negative effect on teaching and
learning?
Face validity

Face validity refers to the test's “surface


credibility or public acceptability”
(Alderson, et al., 1995:172).
Bachman (1990:307) states that “face
validity is the appearance of real life.”
Do test takers/ teachers/ politicians/ the
public generally believe in the value of the
test?
Face validity

The assessment is credible to users: it looks as though it


measures the skills or abilities of interest.
For example, a multiple choice grammar test does not look as
though it really tests the ability to speak English in real-world
situations. All kinds of evidence could be used to show that
people who pass the test are actually able to communicate
effectively, but users may not be convinced because test takers
are not actually required to speak. If the test does not have
face validity, it is unlikely to be successful.
Construct validity

In recent years the term construct validity has been


used to refer to the general, overarching notion of
validity.
It is not enough to assert that a test has construct
validity; empirical evidence is needed.
(Hughes, 2003)
The arguments for using the test as a reasonable
justification for taking any decision must be presented
and examined: validation.
Round-up:
suitable data for test validity
Face validity Questionnaires to and interviews with candidates,
administrations and other users.
Context validity a) Compare test content with specifications/syllabus.
b) Questionnaires to and interviews with 'experts' such as
teachers, subject specialists, applied linguists.
c) Expert judges rate test items and texts according to
precise list of criteria.
Cognitive validity Students introspect on their test-taking procedures, either
concurrently or retrospectively. Keystroke logs. Eye-
tracking.
Concurrent validity a) Compare students' test scores with their scores on
another test.
b) Compare students' test scores with teachers' rankings.
c) Compare students' test scores with other measures of
ability such as students' teacher rating.
Suitable data for test validity
Predictive validity a) Compare students' test scores with their scores on tests
taken some time later.
b) Compare students' test scores with success in final exam.
c)  Compare students' test scores with other measures of their
ability taken some time later, such as employers'
assessments of their ability.

Construct validity a) Compare performance on each subtest with other subtests.


b) Compare performance on each subtest with total of all other
subtests.
d) Compare students' test score with students' biodata and
psychological characteristics.
e) Multitrait-multimethod studies.
f) Factor analysis.
Who is a validator?
Roles Example validity questions
• Designers • Does the design of the test reflect an adequate
theory of language?
• Producers • Is an appropriate balance of abilities required for
success on the test?
• Organisers • Do the test items reflect the designers’ intentions?
• Is the test organised and administered in a way that
• Administrators will ensure fairness?
• Assessees • Do assessees respond to the test tasks in a way that
reflects realistic language processing?
• Scorers • Do scorers consistently and accurately capture the
qualities of test takers’ performance?
• Users • Are decisions taken by users justified by the test?
Who is a validator?

Assessment developers (teachers, testing agencies):


• to check the quality of their own work;
• to showcase the quality of their tests.
Assessment users:
• to check that tests are giving them accurate and
relevant information.
Independent agencies:
• to enforce/ encourage good quality assessment.
Conclusion

• Test validity, according to Alderson et al.,


(1995:193), is 'time-consuming and difficult'.
• However, it is essential as a test without
validity cannot be useful as a decision making
tool.
• Applied linguists and teachers should focus
more of their efforts on practical research in
this field.

You might also like