Testing: criteria for evaluating tests
Content validity
Face validity
Refers toa test appeafing fo test what it is trying to test. This
Refers to.a test testing what it is supposed to test In
constructing a test you should draw up a list of the skills,
‘structures.etc. that you want to test. Then devise the test
using this list. The test may not contain all of these things but
should contain a representative selection of them. This helps
avoid-testing what is easy to-test rather than what is important
to test,
is not a'scientific concept; it refers to how the test appears to
the users. For example, ifyyouiaim fo'test a student's ability to
read:and understand whole texts, Itimight appear strange to
do-this by giving them a multiple:choice grammar test.
Test reliability
"| would get more or fess the:same results. (It would never be
‘This means that if the Same Students, with the same amount
of , took the same test at a different time they.
exactly the same because hurnans aren't like that.) The closer
the resutts, the more reliable the testIt is unlikely that
teachers designing tests will be able to test this kind of
reliability. If a student does surprisingly badly or well in a test,
what do you do? (A disadvantage of tests as the sole means
of assessment!)
Scorer reliability
This means that different markers or Scorers would give the _|
same marks to the same tests. This is. easy with discrete item
tests-such as multiple choice if there really is only one correct
answer and the markers mark accurately. But with, for
example, a piece of ‘free writing’, the marking may be more:
Subjective, particularly if the marker knows the students who
did:the test. To improve scorer reliability you can use things
like clear guidelines for marking (criteria and points awarded),
Standardisation meetings (to. compare sample tests and
agree on what constitutes.an A, B or a C.for example), or
double marking (two teachers. mark each piece of work and.
the score is averaged).