Professional Documents
Culture Documents
Operationalizing principles
1. Maximizing overall usefulness, rather than individual test qualities
2. Interdependence of the qualities
3. Appropriate balance is context-dependent
Reliability
Reliability
Examples:
• Same test - 2 different occasions
• 2 interchangeable forms of a test
Reliability: refers to the consistency of test results
Student-related
reliability
Rater reliability
Test administration
reliability
Test reliability
Factors contributing to the unreliability of a test
• Subjectivity, bias
• Inter-rater reliability (inconsistent scores)
Rater reliability • Intra-rater reliability (not even-handed
judgment)
• Split-half reliability is another subtype of internal consistency reliability. The process of obtaining
split-half reliability is begun by “splitting in half” all items of a test that are intended to probe the
same area of knowledge in order to form two “sets” of items. The entire test is administered to a
group of individuals, the total score for each “set” is computed, and finally the split-half reliability
is obtained by determining the correlation between the two total “set” scores.
Reliability
How could Mawuse handle each of the situations below? Get out a sheet of
paper. Using what you’ve learned about reliability, brainstorm possible
answers to these questions.
• What we have to do is construct, administer and score tests in such a way that the scores
actually obtained on a test on a particular occasion are likely to be very similar to those
which would have been obtained if it had been administered to the same students with
the same ability, but at a different time.
• The more similar the scores would have been, the more reliable the test is said to be.
How to make the tests more reliable
1 Take enough samples of behaviour.
2 Exclude items which do not discriminate well between weaker and stronger students.
• Other things being equal, the more items that you have on a test,
the more reliable that test will be
• One thing to bear in mind, however, is that the additional items
should be independent of each other and of existing items.
• Each additional item should as far as possible represent a fresh
start for the candidate. By doing this we are able to gain
additional information on all of the candidates — information
that will make test results more reliable.
#2 Exclude items which do not discriminate well between
weaker and stronger students
Construct validity
Content validity
Criterion-related validity
Face validity
Consequential validity
Construct validity
Domain-specific construct
Predictive
Concurrent Criterion-related validity
Effect on learners
Operationalizing principles
1 . Maximizing overall usefulness, rather than individual test qualities
2. Interdependence of the qualities
3. Appropriate balance is context-dependent
AUTHENTICITY
Authenticity
Operationalizing principles
1 . Maximizing overall usefulness, rather than individual test qualities
2. Interdependence of the qualities
3. Appropriate balance is context-dependent
INTERACTIVENESS
Interactiveness
• Whether and to what extent test tasks call for test takers’ language knowledge
and ability for successful task completion.
LANGUAGE ABILITY
(Lang. knowledge
& metacognitive
strategies)
Topical Affective
knowledge schemata
Characteristics
of language
test task
Model of test usefulness
Operationalizing principles
1 . Maximizing overall usefulness, rather than individual test qualities
2. Interdependence of the qualities
3. Appropriate balance is context-dependent
IMPACT
Impact
Operationalizing principles
1 . Maximizing overall usefulness, rather than individual test qualities
2. Interdependence of the qualities
3. Appropriate balance is context-dependent
Backwash …is the effect that tests have
on learning and teaching.
80
Sample widely and unpredictably
• Testing a restricted area will have the backwash effect only in that area.
• A wider range of tasks should be used in testing.
• Test across the full range of the specifications.
81
Use direct testing
82
Make testing criterion-referenced
83
Base achievement tests on objectives
84
Ensure test is known and understood by students and teachers
• The rationale (principles) for the test, its specifications, and sample items should be
made available to candidates.
85
Where necessary, provide assistance to teachers
• The test will not achieve its intended effect, if the teachers need guidance and
possible training.
• Where new tests are meant to help change teaching, support has to be given to
help effect the change.
86
Counting the cost
87
Essay writing
• Formula
Ar Ar: available resources
Nr: needed resources
Nr
Quotient ≥ 1: a test is of practicality
Quotient ≤ 1 not practical
(Bachman & Palmer, 1996)
• A test should be easy and cheap to construct, administer, score and interpret
Summary