Professional Documents
Culture Documents
A good idea language test has certain characteristics among the most
important characteristics or features of language tests are the following
one's
1. Validity
2. Reliability
3. Practicality
4. Accuracy
5. Comprehensiveness
6. Relevance
7. Balance
8. Clarity
9. Authenticity
10. Appropriate for time
The FIRST THREE CHARACTERISTICS, mainly validity, reliability, and practicality,
are the most important ones among the other mentioned features that should be
available in each test. So, let’s try to define and classify them.
C- Criterion-related/ or Empirical validity: How well does the test measure what
you want it to?
We have empirical validity, at other times called criterion-related validity. It
raises the question, how well does the test measure what you want it to?
Empirical means that we are going to make an experiment to check the validity of
the test.
Say, for instance, that I have an oral test. I'm after testing my students’ speaking
skill. S, I have an oral test, and the number of the students, say, is about 100. So,
in order to examine my students’ oral skill in all of its aspects (how do they speak,
the way they pronounce the words, the way they use intonation, the way they
stress the most important words, they way they hesitate, etc.)
səʊ, this is the speaking style. It is composed of many aspects. Accordingly, in
order to be able, as a teacher, and tester at the same time, to test all these
aspects, I need about 45 minutes for each student in order to test such aspects.
Then I have 100 students in my class, and each one of them needs about 45
minutes. So, I need so long time in order to complete the whole process of
checking my students oral or speaking skill. This is not realistic (I don’t have so
many lessons within which I can test my students’ oral skills). So, what do I do? So
I am usually going to give a few minutes say 10 as an utmost number for each
student.
There’s a big difference between 10 minutes and 45 minutes.
Hence, in order to know whether this is scientifically objective or not valid or not,
we are supposed to make an experiment. In this experiment. I'm going to check
speaking skill the oral skill of my students, but in this time, instead of taking 100
student, the whole class, I select randomly of all the different levels of the
students. Say I take about five excellent, five very-good, another five medium,
and five poor students. Then, I have about 25 students. Out of 100 I take 25. I'm
going to give each one of those 25 students a complete 45 minutes, and I ask
about the different subskills of the speaking skill.
Then, I make a comparison between the first instance of the test that is 100
students, each one given 10 minutes, and the other group which is 25 students
for which I have given each student 45 minutes, and I check whether the test was
valid concerning the checking of all the aspects of the speaking skill, and the
results will tell me whether the teacher or tester was successful in covering all the
aspects of that skill or not. Then, empirical validity needs to be done in this way. I
need to make an experiment, to check whether the test is empirically valid or not.
This doesn't take place in our schools. Though unrealistic still we need to do this
type of test whenever we are able to.
Within criterion-related validity (empirical validity), we have two types, the first
one we call it concurrent validity. That is, we check what is the kind of the
performance our students have at that particular time of the test. Because
students’ performance cannot be the same all the time. Sometimes students
perform very well, some other times they are not quite satisfied with their
performance
The second type is predictive validity; that is to say what will happen in the
future, we predict the performance of our students, that is to say certain
questions would tell you what kind of skills, knowledge, or developed abilities can
be seen in the near future.
D - Construct validity: Are you measuring what you think you are measuring?
The last type of validity is called construct validity. In general the word construct
means ability or skill.
So, a test, or part of a test or testing technique, is said to have construct validity if
it can be demonstrated, that it measures just the ability which it is supposed to
measure. Back to our example of the oral skill, say I want to check your skill in
speaking should I make an oral or a written test, of course I should make an oral
test. It is the suitable kind of test to test speaking skill. Sometimes, however,
particularly in ministerial examinations, some written questions are about
pronunciations.
But this is not valid, when I want to test the speaking skill. I need to make a
speaking test, an oral test, not a written one.
Then, construct validity means that the test needs to measure that kind of skill, of
ability that it is supposed to measure. Though it is unrealistic, we need to do this
type of test whenever we are able to do so.
Very simply put, reliability means consistency. That is to say, if I make a test and I
remake, that is repeat, the same test to the same students, but at a different
time, and get similar results, I say the test is reliable. But when I get different
results, then I say that this test is unreliable. So, reliability of a test is concerned
with stability of the test scores.
So, if I get the same results, or at least similar results, approximate results, then I
say the test is reliable. If no, then it is unreliable.
Think about this question.
Do you prefer to have a test composed of one question or many, say two or three
to five, questions?
- Standard conditions
Then hear my students are not under the same conditions. They are not under
the same testing conditions.
So, standard conditions means we need to put our students, the testees, the
examinees, the ones who are taking the test, under the same conditions.
Say, for example, I have the same example of the oral test, and the same number
of students (100). The test is not oral, it's a listening comprehension test, and I
have a certain Passage, and I'd like my students to listen to this passage read by a
recorder. Then, after they listen to The Passage, there are certain questions that
they should answer.
Here, I have to make sure that the student who is near to recorded, hears the
passage quite clearly, just like the student at the back of the classroom.
I have to make sure of this, because if they cannot hear similarly and clearly, then
how come I say that the test is reliable? It, in fact, can't be reliable, if the students
are not under the same conditions. So, standard conditions means that your
students should be under the same conditions
- Standard tasks
Standard tasks mean that the students answer the same level of questions, when
I give them a task to perform, or a question to answer.
For instance, students who are sons or daughters of the headmaster,
headmistress, or one of the teachers, are given priority in examinations,
especially oral ones. While the other students are given difficult questions.
Hence, you have to be fair in the sense that you give to the tasks. Don't be biased.
- Standard scoring
If we have objective test, such as MCQ, or true/false, or matching questions,
these are objectively scored, having only one correct answer. So, we don't expect
the tester to be biased in the process of scoring. Whereas in subjective tests, such
as essay writing, students answer differently, so we are going to have different
answers. So such type of questions is considered to be subjective, in the scoring
process. So standard scoring means that whenever we have an objective test,
there's no problem, there is only one correct answer. In the case that we have
subjective tests, such as compositions, essays, here we have to make a scoring
scheme, we call it scoring key, answer key. Within it, we say that this is the way I
give the marks, if the student managed to give a right language, empty of
mistakes, grammatical or spelling, the idea is clearer then I give such a mark, if
not, then I give such a mark. This is a kind of a scoring scheme, or an answer key,
that we have to prepare beforehand.
- Test-retest reliability
This is similar to, in a sense, to multiple samples. So, we test our students and
then we retest them with the same questions.
- Inter-rater reliability
This is related to the scoring, or rating, reliability.
Actually, in the rating, scoring reliability of a test, we have two types; inter-rater
reliability and intra-rater reliability.
The first one, Inter-rater reliability, we have a test, and then our students answer
the test. So, we collect their answer sheet, and then we distribute the answer
sheet on a group of raters, scorers, teachers who are scoring the test, say two to
three raters. Then, we check the reliability of scoring.
In the second type intra-rater, the same scorer, we give him the sheets to score,
and then we regive him the sheets to score again. That is to say, he or she is
asked to score the same sheets twice or thrice. After that, we check whether the
rater has been subjective in the process of rating or not.
- Internal consistency reliability
Internal consistency reliability is again related with the components of the test,
and how much they are reliable in the process of scoring, and this is related to
test-retest and multiple sample of reliability.
The economics of the test: is the test economical or not? Are we going to pay
money for somebody to photocopy, for instance, or to print test sheets?
Relevance; Also, the test should be relevant. It measures reasonably well, the
achievement of the desired objectives.
Clarity; They should be clear, question should be clear. Some students have a
difficulty in understanding the question words, so they ask those people in the
classroom, what is the meaning of this word? So, the questions need to be clear,
not only the questions, but also the instructions given by the teacher or the tester
need to be clear. Students should know what to do exactly.
Authenticity; The material of the test should be authentic. That is the language
of the test should reflect everyday discourse and communication.
Appropriate for time; It should be appropriate for time, that is the questions
should not be quite lengthy in the sense that the students will not have enough
time to answer them all. So, a good language test should be appropriate in length.
Also, It should be appropriate in difficulty. That is the say, the test should not be
too hard, nor too easy; the question should be progressive in difficulty, so as to
reduce issues of cheating, or to reduce stress and tension.
The test needs to be diagnostic; a good test is diagnostic, that is the aim of
diagnosis is to analyze the difficulties of the students, in particular, at the time of
taking the test.
It should have utility; utility is usefulness, so, a good test need to be useful in
various ways.
The first type of test we call it objective test. Usually it comprise closed-ended
questions, we expect only one answer.
The first one is MCQ, we're coming to the construction of MCQ questions test.
An MCQ question, item, or point Is composed as follows;
The initial part of this item (An item means a question within a test or a point
within a test) we call it the stem. The stem can come in two different ways, or
forms - the first form, which is the commoner form, is in the form of a question.
The second form is a statement and within the statement there is a blank so we
call it an incomplete statement.
The correct responses should not appear having the same letter, don't
make all your correct responses as the second option in all the items, or
choosing B to be the correct answer in all the choices. So, we need to
switch the correct responses, make them once A once B, etc. so that you
decrease guessing.
Don't indicate for the students that this is the correct answer by the length
of the option. Don't give lengthy options.
If we follow these procedures, the test would look good
The language should be clear. Also, the directions should be brief and clear
and indicating the bases for matching items in both columns.
Usually the items In the premises tend to be short. We would like to
concentrate the attention of our students on the second code; on
responses.
We say, for example,
Match the items in column A with their suitable responses in column B.
They should appear on the same page, because it is annoying for them to
appear on the other page.
A disadvantage is the tendency to use this format for the simple recall of the
information, only to remember information
Tests can be categorized to at least two categories; those which ask the students
only to recognize the correct answer, such as MCQ, T/F, and matching tests,
hence called recognition tests. The other category of tests asks the students to
produce, write something, we call them production or productive tests.
4- Gap-filling.
Gap means a slot, an empty space, a blank that should be filled with the right
Information. It is objective, one correct answer. It is usually used to learn and
grammar, tenses, prepositions,, to check grammatical or vocabulary knowledge.
It is constructed as follows:
We may say
Write the number of the sentence or of the item and the letter of the most
suitable word that fills each blank.
Then, below the question, we open brackets, and put different word, we then
close the brackets and list a number of sentences, these sentences contain blanks.
So, the student is supposed to read the sentences, and complete them from the
words we have already given to them between brackets. So, no creativity on the
part of the student, only recognition.
5- Odd-one-out.
Odd mean strange. It is a kind of technique that asks the students or tells them
that there is one of these things that is not like the others, it doesn't belong. One
of certain shapes, colors, or figures is different. So, we asi the students to take it
out from the group.
We, for example, may say
write the number of the item and the Odd-one-out in each set of words.
Then, we give a number of words, say four.
For instance, we give three verbs and one preposition.
Or we give four squares and one triangle.
Or three similar colors and one different color.
6-Rearrangement:
This test format demands the arrangement of a number of words to make a
meaningful sentence or arrange a series of sentences to make a meaningful and
coherent piece of writing. The test is easy to design and can assess effectively the
students command of language syntactically and semantically.
7-Labelling
labelling means putting labels on pictures, charts, figures, or shapes. This is called
labelling which is another type of objective test. The testee is required to label
certain areas of a diagram or a picture which is accompanied by a text. The
testees are asked to read information from the text, and label diagram or the
figure accordingly. What kind of level, we use beginners. Maybe early
intermediate stage.
8-Grid (chart).
Grid which means chart, it is another objective test. It is usually put in the form of
a timetable or a chart, and we give usually set of sentences, this set of sentences
includes a certain word, maybe in contracted form, for example, he's read the
short story, meaning he has read the short story.
We write he’s read the short story, then after that we put certain chart or
columns, and within the columns, we write the possibilities of interpreting this
apostrophe S, is it for possessive? does it mean is or has? The student is required
to communicate his understanding of the material presented in boxes and
respond by selecting or ticking the right box.
10-Transcoding
The last type of objective test is called transcoding. Usually a word with the first
part as trans means there’s a sense of changing, a change is taking place from one
thing to another, one shape to another, one medium to another. Say, for
example, transformation, transportation, translation, or, in our case, transcoding.
So transcoding means we transform or transfer a certain medium to another, a
certain form of written material to another. So, for instance, I give my students a
description of a classroom, I give them a paragraph describing whatever a
classroom containing, say furniture. Then, below this paragraph, I draw a
classroom picture and I ask the students to transcode, that is to transfer, the
information in the paragraph into the drawing, into the picture, this is called
transcoding. So, they transcode the written material to a drawing, showing the
parts of the classroom, or say the parts of an animal, or a plant, etc. The level is
for high level students.
Subjective testing techniques:
1. Composition writing
2. Letter writing
3. Essay writing
4. Precis writing
Returning to subjective testing techniques, we have four, all of which are written
forms. Here we ask the students to write not to recognize, to produce, for
example, to write a composition, a letter, an essay, or to summarize.
There are two types of composition, restricted (guided) and free composition.
We studied, when we were freshmen, guided composition, that is to say
restricted. Guided means that there is somebody who is guiding you in writing the
composition, you are not free to write whatever you want. You are restricted by
the number of the words, the topic that you are supposed to write on, the style
that you should use in your writing, whether argumentative, descriptive, or
narrative.
As sophomore students, we studied free composition, from the name, you're free
now, you're not guided anymore, free to express whatever you believe in writing
on a certain topic.
Now, in such subjective questions, the teacher is supposed to give two or more
topics to the students, because if I give only one topic maybe the students don’t
know how to express their opinions concerning this topic, or may not understand
it in the first place. Also, if it is in the guided form, the teacher should restrict the
number or limit the number of words.
A teacher may say, write a composition or an essay on one of the following topics
(and you mentioned the topics) your composition should not exceed the limit of
300 words, so it’s restricted.
In letter writing, also a certain format is usually given to students and they
are asked to fill in the gaps, in, for instance, a letter, giving certain phrases.
Or they write essays concerning a certain topic or subject matter.
precis is a kind of written piece in which students are asked to summarize a
certain text. So, in all these we test the students ability to write, the writing
skill is under examination. They are not reliable, because the scoring
procedure is subjective. They may have content validity to a certain
degree, if the content of the topics reflects the content of the syllabus.
3-Completion.
completion usually comes in the form of an incomplete statement that need to be
completed, the blanks need to be filled with suitable information.
4-Transformation.
It is usually related to grammatical structures, so we ask the students, for
instance. For instance, we give them a paragraph that is written in the present
tense, and we ask the. to transform it into the past tense. We say, transform
the following paragraph from the present simple tense to the past tense, this is
transformation.
5-Gap-filling.
Gap-filling is different in the case of semi objective tests, in comparison to
when it is objective. In the case of objective tests, gap-filling, we supply the
student with the choices, they choose the suitable word and they fill the gap
with it, or they fill the blank. However, in the case of semi objectives, students
are supposed to provide the right answer. We do not provide them with the
right answer.