Assessment

University of Basra
College of Education for Human Sciences

Dr. Balqis Gatta
Rewritten by: Jalal Nasser
Characteristics of a Good Language Test
As teachers we have to continuously assess our students, and this continuous

process of assessment may take different forms, among them, observation of our
students and their class activity is a kind of assessment, quizzes, topics for
research, portfolios, but the most common form of assessment is a test. We have
to make tests. A test is a formal form of assessment.
Sometimes, we notice that tests have certain shortcomings, defects, and

weaknesses either in the layout or in the kind of questions or in the content, or
whether they are lengthy or not, whether they cover all the teaching material or
not.
So, there are different reasons and criteria that can make a test not quite good.
So, in order to understand such matters, issues related to identifying how much a
test is good, we need to identify certain characteristics that need to be available
in each single test.
 A good idea language test has certain characteristics among the most
important characteristics or features of language tests are the following
one's
1. Validity
2. Reliability
3. Practicality
4. Accuracy
5. Comprehensiveness
6. Relevance
7. Balance
8. Clarity
9. Authenticity
10. Appropriate for time
The FIRST THREE CHARACTERISTICS, mainly validity, reliability, and practicality,
are the most important ones among the other mentioned features that should be
available in each test. So, let’s try to define and classify them.
1-Validity; The validity of an assessment tool ( a test) is the extent to which it

(the test) measures what it was designed/supposed to measure.
When a test measures what it was supposed to measure, only then it is called
valid.
Whenever I want to make a test, I need to have a certain objective and aim in my
mind. In order, for instance, to check the achievements of my students, or to
check my student ability in performing a certain skill. Then, this is my aim or
purpose behind making this test. So, when the test accomplishes or measures this
particular purpose, aim, or skill, then I say the test is valid.
However, this is not enough, because there are certain types or kinds of validity,
among them are the following:
A-Face validity: Do the test items appear to be appropriate?

We have face validity. The first thing we see in a person is his or her face. Then,
face means that thing that is apparent, that we can see and observe. Now, you
may say that the literal meaning of the word or term face validity is that how does
the test look to students, learners, teaches, supervises, and educators in general,
so the way the test looks is called the face validity. In other words, we are asking
the question do the assessment questions or items (questions are the same as
items) appear to be appropriate? are they appropriate? So, how do they look to
those who are taking the test? Then, this is called face validity.
This is also not enough, because they may look appropriate as we said, but they
are not quite balanced, in the sense that they can cover all the content of the
teaching material we have already covere. This brings us to the second type of
validity which is content validity.
B- Content validity: Does the test content cover what you want to assess?
Usually we base the content or the items of the test on the content of the
teaching material we covered, and we want to test our students knowledge about
such content. That is the say, there need to be a certain relationship between the
content of the questions and the content of the material that we have taught, or
at least the content or the objectives of that particular period of the course.
So, when I say that the test should have content validity, it means that the
questions should be related to the teaching material.
Sometimes, it is possible to give material in a test that is not included in the
taught material, this is, however, not the usual way, I test my students with the
content that I have covered.
C- Criterion-related/ or Empirical validity: How well does the test measure what
you want it to?
We have empirical validity, at other times called criterion-related validity. It
raises the question, how well does the test measure what you want it to?
Empirical means that we are going to make an experiment to check the validity of
the test.
Say, for instance, that I have an oral test. I'm after testing my students’ speaking
skill. S, I have an oral test, and the number of the students, say, is about 100. So,
in order to examine my students’ oral skill in all of its aspects (how do they speak,
the way they pronounce the words, the way they use intonation, the way they
stress the most important words, they way they hesitate, etc.)
səʊ, this is the speaking style. It is composed of many aspects. Accordingly, in
order to be able, as a teacher, and tester at the same time, to test all these
aspects, I need about 45 minutes for each student in order to test such aspects.
Then I have 100 students in my class, and each one of them needs about 45
minutes. So, I need so long time in order to complete the whole process of
checking my students oral or speaking skill. This is not realistic (I don’t have so
many lessons within which I can test my students’ oral skills). So, what do I do? So
I am usually going to give a few minutes say 10 as an utmost number for each
student.
There’s a big difference between 10 minutes and 45 minutes.
Hence, in order to know whether this is scientifically objective or not valid or not,
we are supposed to make an experiment. In this experiment. I'm going to check
speaking skill the oral skill of my students, but in this time, instead of taking 100
student, the whole class, I select randomly of all the different levels of the
students. Say I take about five excellent, five very-good, another five medium,
and five poor students. Then, I have about 25 students. Out of 100 I take 25. I'm
going to give each one of those 25 students a complete 45 minutes, and I ask
about the different subskills of the speaking skill.
Then, I make a comparison between the first instance of the test that is 100
students, each one given 10 minutes, and the other group which is 25 students
for which I have given each student 45 minutes, and I check whether the test was
valid concerning the checking of all the aspects of the speaking skill, and the
results will tell me whether the teacher or tester was successful in covering all the
aspects of that skill or not. Then, empirical validity needs to be done in this way. I
need to make an experiment, to check whether the test is empirically valid or not.
This doesn't take place in our schools. Though unrealistic still we need to do this
type of test whenever we are able to.
Within criterion-related validity (empirical validity), we have two types, the first
one we call it concurrent validity. That is, we check what is the kind of the
performance our students have at that particular time of the test. Because
students’ performance cannot be the same all the time. Sometimes students
perform very well, some other times they are not quite satisfied with their
performance
The second type is predictive validity; that is to say what will happen in the
future, we predict the performance of our students, that is to say certain
questions would tell you what kind of skills, knowledge, or developed abilities can
be seen in the near future.
D - Construct validity: Are you measuring what you think you are measuring?
The last type of validity is called construct validity. In general the word construct
means ability or skill.
So, a test, or part of a test or testing technique, is said to have construct validity if
it can be demonstrated, that it measures just the ability which it is supposed to
measure. Back to our example of the oral skill, say I want to check your skill in
speaking should I make an oral or a written test, of course I should make an oral
test. It is the suitable kind of test to test speaking skill. Sometimes, however,
particularly in ministerial examinations, some written questions are about
pronunciations.
But this is not valid, when I want to test the speaking skill. I need to make a
speaking test, an oral test, not a written one.
Then, construct validity means that the test needs to measure that kind of skill, of
ability that it is supposed to measure. Though it is unrealistic, we need to do this
type of test whenever we are able to do so.
2. Reliability: Is the degree to which an assessment tool (a test) produces stable

and consistent results. It is the stability of a certain test scores.
Very simply put, reliability means consistency. That is to say, if I make a test and I
remake, that is repeat, the same test to the same students, but at a different
time, and get similar results, I say the test is reliable. But when I get different
results, then I say that this test is unreliable. So, reliability of a test is concerned
with stability of the test scores.
So, if I get the same results, or at least similar results, approximate results, then I
say the test is reliable. If no, then it is unreliable.
Think about this question.
Do you prefer to have a test composed of one question or many, say two or three
to five, questions?
The requisites of a reliable/ dependable test are as follows:

- Multiple samples
How to check reliability?

I have different forms of reliability. One of them is multiple samples.
 Multiple samples mean that you take different performances of the same
student. Sometimes, students perform poorly, not because they don't
know, but because of different factors and reasons, maybe some of them
don't feel well. Some, at other times, they haven't studied well, but this
one performance is not enough to reflect to the teacher the kind of level,
the real level, of the students. So, multiple samples means you take
different samples for the same student and then you check his or her level.
Multiple samples means different performances of your students.
- Standard conditions
Then hear my students are not under the same conditions. They are not under
the same testing conditions.
So, standard conditions means we need to put our students, the testees, the
examinees, the ones who are taking the test, under the same conditions.
Say, for example, I have the same example of the oral test, and the same number
of students (100). The test is not oral, it's a listening comprehension test, and I
have a certain Passage, and I'd like my students to listen to this passage read by a
recorder. Then, after they listen to The Passage, there are certain questions that
they should answer.
Here, I have to make sure that the student who is near to recorded, hears the
passage quite clearly, just like the student at the back of the classroom.
I have to make sure of this, because if they cannot hear similarly and clearly, then
how come I say that the test is reliable? It, in fact, can't be reliable, if the students
are not under the same conditions. So, standard conditions means that your
students should be under the same conditions
- Standard tasks
Standard tasks mean that the students answer the same level of questions, when
I give them a task to perform, or a question to answer.
For instance, students who are sons or daughters of the headmaster,
headmistress, or one of the teachers, are given priority in examinations,
especially oral ones. While the other students are given difficult questions.
Hence, you have to be fair in the sense that you give to the tasks. Don't be biased.
- Standard scoring
If we have objective test, such as MCQ, or true/false, or matching questions,
these are objectively scored, having only one correct answer. So, we don't expect
the tester to be biased in the process of scoring. Whereas in subjective tests, such
as essay writing, students answer differently, so we are going to have different
answers. So such type of questions is considered to be subjective, in the scoring
process. So standard scoring means that whenever we have an objective test,
there's no problem, there is only one correct answer. In the case that we have
subjective tests, such as compositions, essays, here we have to make a scoring
scheme, we call it scoring key, answer key. Within it, we say that this is the way I
give the marks, if the student managed to give a right language, empty of
mistakes, grammatical or spelling, the idea is clearer then I give such a mark, if
not, then I give such a mark. This is a kind of a scoring scheme, or an answer key,
that we have to prepare beforehand.
- Test-retest reliability
This is similar to, in a sense, to multiple samples. So, we test our students and
then we retest them with the same questions.
- Inter-rater reliability
This is related to the scoring, or rating, reliability.
Actually, in the rating, scoring reliability of a test, we have two types; inter-rater
reliability and intra-rater reliability.
The first one, Inter-rater reliability, we have a test, and then our students answer
the test. So, we collect their answer sheet, and then we distribute the answer
sheet on a group of raters, scorers, teachers who are scoring the test, say two to
three raters. Then, we check the reliability of scoring.
In the second type intra-rater, the same scorer, we give him the sheets to score,
and then we regive him the sheets to score again. That is to say, he or she is
asked to score the same sheets twice or thrice. After that, we check whether the
rater has been subjective in the process of rating or not.
- Internal consistency reliability
Internal consistency reliability is again related with the components of the test,
and how much they are reliable in the process of scoring, and this is related to
test-retest and multiple sample of reliability.
3. Practicality: It means test usability.

It means usability of a test, how much a test is usable, feasible. Here, we have
certain issues that are related to the economics or course of the test,
administration considerations, such as time and the scoring procedures, as well as
the case of interpretation. Tests are only as good as how well they are
interpreted. Therefore, tests that cannot be easily interpreted will definitely
cause many problems.
For instance, our classes are crowded with students. Say, I have 255 students,
such as in our case, and I have a written exam, and I have only two rooms for the
students to sit in, so within each classroom, I have about 125 students. This is a
very large number and hard to deal with, because I have, as a teacher, to take
care of the seating system, the way the students sit inside the class, so that no
cheating takes place. Not only this, some of the students are left-handed. So, such
a kind of student we should take care of, because they cannot sit anywhere like
the others they should be positioned in a particular way.
So, the way in which the students sit is included in the practicality of a test.
The economics of the test: is the test economical or not? Are we going to pay
money for somebody to photocopy, for instance, or to print test sheets?
There are some other features of a good test.

Comprehensiveness; that is, it covers all the items that the teacher has taught,
and the students studied. it includes items from different areas, not only all of the
items of the teaching materials, but from different areas of the materials assigned
for the test, as to check accurately the amount of the students’ knowledge.
Relevance; Also, the test should be relevant. It measures reasonably well, the
achievement of the desired objectives.
Balance; It should be balanced, and here we have a sort of making a balance

between linguistic as well as communicative competence, and there is a
difference between the two, we say linguistic competence, that is the linguistic
knowledge. While the communicative competence is related to the use of that
knowledge, how are we supposed to use that linguistic knowledge in real life
situations?
Clarity; They should be clear, question should be clear. Some students have a
difficulty in understanding the question words, so they ask those people in the
classroom, what is the meaning of this word? So, the questions need to be clear,
not only the questions, but also the instructions given by the teacher or the tester
need to be clear. Students should know what to do exactly.
Authenticity; The material of the test should be authentic. That is the language
of the test should reflect everyday discourse and communication.
Appropriate for time; It should be appropriate for time, that is the questions
should not be quite lengthy in the sense that the students will not have enough
time to answer them all. So, a good language test should be appropriate in length.
Also, It should be appropriate in difficulty. That is the say, the test should not be
too hard, nor too easy; the question should be progressive in difficulty, so as to
reduce issues of cheating, or to reduce stress and tension.
Further, the test should be objective and economical.

Administrability, the test should be easy to administer, economical in time and
money. It should be easy to score and interpret. So, the kind of test we have
should be clear-cut, there should be answer keys for the scoring.
The test needs to be diagnostic; a good test is diagnostic, that is the aim of
diagnosis is to analyze the difficulties of the students, in particular, at the time of
taking the test.
It should have utility; utility is usefulness, so, a good test need to be useful in
various ways.
 All of these characteristics are important, with different degrees.

 Negation of the terms is; Valid, invalid, reliable, unreliable, practical,
impractical.
The Construction of some Language Tests
Remember a test is a tool of assessment which is intended to measure students’

knowledge, skills, aptitude, abilities … etc.
A test can take different forms; it can be oral, written, or online. Then, it seems
that tests can be classified into different types. However, the major types of tests
are:
1. Objective tests that usually comprise closed-ended questions.
2. Subjective tests which usually include open-ended questions (free response
questions).
A third type of tests can be added which is described as being semi-objective and
includes restricted-response questions.
As it is a clear from the title, we are talking about some language tests, not all of
them.
A test is a form of assessment, It is only one form of assessment because
assessment includes many different forms of testing techniques. These testing
techniques can be in the form of tests, observations, research projects, portfolios,
etc. A test is a tool, a method of assessment, within it we expect to have a set of
questions.
Questions are of two general types:

1- Closed-ended questions.
2- Open-ended questions.
 In closed-ended questions, we expect to have only one correct answer
such as the case in multiple choice questions (henceforth MCQ). So, in
MCQ we have closed-ended questions. Also, in a true-false test, which is a
test that asks students to say either true or false, one correct answer is
predictable. So, the test is again closed-ended, and we don't have different
possibilities of responses.
 In open-ended questions, there is a sort of freedom in responding to the
task or the question. The student is a free to express himself or herself in
giving the answer according to his understanding, such as in essay writing
tests. In composition or essay writing tests, I'm not expecting one form of
composition. The way students express their ideas, beliefs, or opinions are
going to be different (the responses are open).
 Another difference is in the scoring, which means correcting the papers
(grading).
In closed-ending question, the scoring should be objective, because there
is only one correct answer. However, in open-ended questions, because we
have different possibilities of writing on the same topic, then the scoring
cannot be objective, It is otherwise subjective (depending on the Personal
Judgment of the teacher)
 Reliability: Closed-ended questions usually have high reliability. While

open-ended questions don't have the same degree of reliability.
The first type of test we call it objective test. Usually it comprise closed-ended
questions, we expect only one answer.
Objective testing technique:

1. Multiple-choice items test (MCQ)
2. True/false items test
3. Matching items test
4. Gap-filling items test
5. Odd-one-out items test
6. Rearrangement
7. Labelling
8. Grid (chart)
9. Transcoding
 The second major item is subjective tests. When we say subjective or
objective, we think about the scoring procedure we follow. They usually
include open-ended questions. We have another name for open-ended
questions (free response questions). The same reason for this name;
different students answer differently.
 The third item is semi-objective test or sometimes called (restricted
response questions). We have restricted response question, restricted
means the students don't have full freedom in expressing their ideas and
beliefs. A student is not as free as in the case of subjective tests. That's why
we are restricting the way in which he should give his response.
In subjective tests, we have composition, letter, and Precis (summary) writings -
to give a passage and I ask the students to summarize it. They are open-ended
tests, students have full freedom in expressing the beliefs, thoughts, opinions.
However, Semi-objective tests include the virtues, advantages, or the good points
of objective as well as subjective tests. So, under such a type of testing technique,
we have cloze tests, short answer questions, completion, gap-filling, and
transformation.
 We have gap filling within semi-objective testing technique and within
objective testing technique. They are not the same. We will see how they
are different, but they carry the same title.
Among the nine objective testing techniques, the first three, viz. MCQ, true/false,
as well as matching item tests are quite frequently used by teachers.
1-Multiple-choice items test (MCQ)

Components of an MCQ:
1. Stem (which comes in the form of a question or an incomplete statement.
2. Options (or alternatives/ choices)
a. One correct answer which is called the KEY.
b. The remaining options are incorrect and they are called DISTRACTORS.
The first one is MCQ, we're coming to the construction of MCQ questions test.
An MCQ question, item, or point Is composed as follows;
The initial part of this item (An item means a question within a test or a point
within a test) we call it the stem. The stem can come in two different ways, or
forms - the first form, which is the commoner form, is in the form of a question.
The second form is a statement and within the statement there is a blank so we
call it an incomplete statement.
This second part is composed of a number of distracters, options, alternatives, or

choices. How many? The number varies, some teachers use only two others
three, still for others, which is the frequent way, it is between four to five. The
more you add options the better the test form is, because we intend here to
decrease not increase the degree of students’ guessing. We are now talking about
an objective test. So only one correct answer is present. This one correct answer
is called the KEY. The other wrong answers or options are called DISTRACTORS, or
DISTRACTERS, because they distract the attention of the examinees.
The stems carry numbers. The options carry letters. Hence, we number the stems
and we letter the options. The options of the letters are usually capitalized, yet
It's possible to use small letters.
What is the format? What do I say? How do I ask my students in

MCQ?
There are different forms, for instance, I may ask them and say:
Write the number of the item and the letter of the correct choice.
Or, Circle or encircle the right choice.
Or, tick the right option.
There are different forms . The maximum number of items in MCQ questions
Should not exceed 10. But the ideal perfect number is between five to seven, or
six to seven, seven is ideal. When you give more than 10 items they are going to
be annoying, boring, and the last ones may be ticked randomly.
We have a problem that students here sometimes resort to guessing. I can
decrease guessing by having more than two options. Two choices are not enough,
so we need to increase the number of choices. So that there is a possibility for the
students to use his knowledge. Because we expect in turn to have only one
correct answer, the scoring is objective, so, they, MCQ questions, are highly
reliable.
 Concerning validity, we said earlier that a test is valid when it measures
what it is supposed to measure.
We identified different types of validity, one of them was content validity,
and we said that content validity is applied when the content of the test is
related to the content of the teaching material we already covered, or
there is a relation between the content of the test and the objectives of the
course. So, objective tests of the type of MCQ items usually have high
reliability and high content validity.
Every single item should have one objective to check. There is only one defying
problem for the student to solve.
Procedures to be followed when constructing a good MCQ items:

 Don’t 't complicate the stems. Usually the stem is put in a positive form not
negative. So, we should avoid using negative particles, as not, no, neither,
only, except, scarcely, , etc.
 Most of the words or the form of the sentence should appear in the stem
not in the options. But that part of the stem that needs to be repeated in
the options should be there in the stem, so that we avoid repeating it in the
options or choices. So we put the main idea in the stem not in the options.
So that we decrease the burden of reading on the part of the student and
we increase the clarity of the item.
 We, as teachers, should try our best to keep the alternatives and choices
similar in length. Because if we make one of them longer than the others,
this would attract the intention of some students and they would think that
this is the correct answer.
 Make the choices or options homogeneous not heterogeneous.
An example, say that the item is as follows;
The past form of the verb go is ______ .
The blank should be completed by the student from the choices.
The choices are ( goes, went, going, gone)
The correct answer is went. All the distracters, the options belong to the same
verb. They are conjugated forms of the same verb, further, all of them also are
verbs. S, they are homogeneous not heterogeneous, they are related to the same
area. We do this so that not to make students perplexed.
 Also, we should avoid the use of a specific words, we call them
determiners, such as definitely, always, frequently , completely, etc. these
may be confusing.
 Avoid using options such as all of the above and none of the above. This
weakens the structure of your multiple choice items, and usually teaches
resort to using them when they don't know what to add next.
 Do not give hints or grammatical cues or hints to the correct answer,

because they can help the student to identify the correct answer. For
instance, the use of the indefinite article and “an”
For example, I had _____ in my breakfast.

I should not say I had “an” then a blank.
Because if I do so, this should be understood by the students that they noun that
follows this indefinite article should start with a vowel.
So, the options are going to be
A- Orange
B- Bread
C- Yogurt
D- Rice
E- Egg
So he will choose an orange or an egg. It's easy to identify, because we have given
the student a grammatical hint. So, “an” should be part of the option not part of
the stem.
 We should avoid using statements, facts, or options that are taken from the
textbook, because students, some of them, memorize materials. So, in
ordered to decrease the influence of memorization, we should avoid taking
or copying material from the text.
 In the stem of an MCQ, a direct question is preferable to an incomplete
statement.
 The correct responses should not appear having the same letter, don't
make all your correct responses as the second option in all the items, or
choosing B to be the correct answer in all the choices. So, we need to
switch the correct responses, make them once A once B, etc. so that you
decrease guessing.
 Don't indicate for the students that this is the correct answer by the length
of the option. Don't give lengthy options.
 If we follow these procedures, the test would look good
2-True/false items test

Components of a True/False question:
A statement of fact that can be judged as either true or false, right or wrong,
correct or incorrect… etc.
The second item is true/false item test. Typically we have only one correct
answer. Here we have a statement of fact. This statement can be judged as either
true or false, right or wrong, correct or incorrect, some teachers might like to ask
his students to say either yes or no, the same idea.
True/false questions are very easy to construct. While in the case of MCQ, they
are very much time consuming. They take so much time to be constructed but in
the case of true-false item, they don't take the same amount of time.
Type of scoring: We expect to have objective scoring because we have only one
correct answer.
What is the form?

We have different ways of asking the same question. We say for instance;
 Say whether the following statements are true or false.
 Or write T in the front of the statement that is true, and F in a front of the
statement that is false.
 Or make a tick, and you write between the two brackets (✔️) if the
statement is correct, and write (✖️) if the statement is wrong or incorrect.
Sometimes, teachers write the statement and an in front of or below it they say
(True – False) . The students is only supposed to tick on the correct answer, or
maybe they are supposed to draw a circle, so in this case, we say circle True if the
statement is true and circle false if it is false.
We should do the following to construct good True/False items:

 We should avoid asking, within the same statement, different tasks. That is
to say, each statement should introduce only one problem to the student,
one task.
 Again, we should avoid using negatives. Don’t use the word not because in
this case the students are misled, confused whether the statement is true
or not.
 Concerning the number of T/F statements, should we make the true
statements more than the false ones or vice versa. We should try our best
to make a balance between the number of the true statements and the
number of false statements. If one of these two types is more than the
other by only one item, it’s okay but don’t make them more than this.
 Concerning the position of them, put them randomly. Do not give a hint to
the students that their expectation for the position of the true and false
statements is right, you need to mix them. For instance, the first one is true
then followed by two false statements, then true, etc.
 Avoid giving hints or cues in the body of the question.
 Don't use determiners, such as always, frequently, never, often, etc.
 Within the case of false questions avoid using materials that may confuse
students.
 Don't use misspelled words, because sometimes misspelling may confuse
the student. Try to be accurate in writing the statement.
3-Matching items tests:

Components of a Matching test:
1. Two columns A and B.
2. The items in the first column are called PREMISES, and the answers in the
second column are called RESPONSES.
3. The items in column A are numbered and those in column B are labeled with
letters (usually capital ones).
4. Usually the number of the responses exceed the number of the premises by
one or two.
This is a frequently used types of objective tests.

The components: we have two columns, A and B, column A is on the left hand
column B is on the right, you should put both of them on the same page.
The items that we put under or within column A are called premises, sometimes
also called stimuli ( plural of stimulus), that's to say they stimulate the student to
say the correct response.
Respecting the items within column B, we call them responses. So, we have
stimuli or premises and responses.
Usually the items in column A are referred to by means of numbers, we number
them 1, 2,...
The items or responses in column B are lettered, and capitalized, yet it’s ok to put
the in small letters.
Again, we have the idea of guessing. In order to decrease this matter of guessing,
we increase the number of the responses, more than the premises. Say that we
have five premises, we give six or seven responses. So, usually we increase the
number of responses by one or two, no more than that.
 The language should be clear. Also, the directions should be brief and clear
and indicating the bases for matching items in both columns.
 Usually the items In the premises tend to be short. We would like to
concentrate the attention of our students on the second code; on
responses.
We say, for example,
Match the items in column A with their suitable responses in column B.
 They should appear on the same page, because it is annoying for them to
appear on the other page.
 The wording of items in column A should be longer shorter than those in

column B.
 These can help students to scan the test questions quickly, and focus their
attention on the responses.
 Column A should contain no more than 10 items, If less it is preferable, six
or seven is preferable, but don't make it more than 10, 10 is maximum.
 The number of responses should be more than the number premises in
about one or two.
 We give numbers to the premises and letters to the responses.
 Don’t try to distract the attention of your students by making unequal
length of responses.
 Avoid using negative statements in either of the columns. Either of the
columns shouldn't include negative wordings.
 This technique, matching, is objective as well as reliable. It also has

validity, at least content validity.
The matching test item format or technique allows you to cover more content in
one question than you can do with MCQ tests. So, it has higher content validity
than MCQ tests. They are also a very efficient approach to testing, and can
provide an excellent objective measurement
A disadvantage is the tendency to use this format for the simple recall of the
information, only to remember information
Tests can be categorized to at least two categories; those which ask the students
only to recognize the correct answer, such as MCQ, T/F, and matching tests,
hence called recognition tests. The other category of tests asks the students to
produce, write something, we call them production or productive tests.
4- Gap-filling.
Gap means a slot, an empty space, a blank that should be filled with the right
Information. It is objective, one correct answer. It is usually used to learn and
grammar, tenses, prepositions,, to check grammatical or vocabulary knowledge.
It is constructed as follows:
We may say
Write the number of the sentence or of the item and the letter of the most
suitable word that fills each blank.
Then, below the question, we open brackets, and put different word, we then
close the brackets and list a number of sentences, these sentences contain blanks.
So, the student is supposed to read the sentences, and complete them from the
words we have already given to them between brackets. So, no creativity on the
part of the student, only recognition.
5- Odd-one-out.
Odd mean strange. It is a kind of technique that asks the students or tells them
that there is one of these things that is not like the others, it doesn't belong. One
of certain shapes, colors, or figures is different. So, we asi the students to take it
out from the group.
We, for example, may say
write the number of the item and the Odd-one-out in each set of words.
Then, we give a number of words, say four.
For instance, we give three verbs and one preposition.
Or we give four squares and one triangle.
Or three similar colors and one different color.
6-Rearrangement:
This test format demands the arrangement of a number of words to make a
meaningful sentence or arrange a series of sentences to make a meaningful and
coherent piece of writing. The test is easy to design and can assess effectively the
students command of language syntactically and semantically.
7-Labelling
labelling means putting labels on pictures, charts, figures, or shapes. This is called
labelling which is another type of objective test. The testee is required to label
certain areas of a diagram or a picture which is accompanied by a text. The
testees are asked to read information from the text, and label diagram or the
figure accordingly. What kind of level, we use beginners. Maybe early
intermediate stage.
8-Grid (chart).
Grid which means chart, it is another objective test. It is usually put in the form of
a timetable or a chart, and we give usually set of sentences, this set of sentences
includes a certain word, maybe in contracted form, for example, he's read the
short story, meaning he has read the short story.
We write he’s read the short story, then after that we put certain chart or
columns, and within the columns, we write the possibilities of interpreting this
apostrophe S, is it for possessive? does it mean is or has? The student is required
to communicate his understanding of the material presented in boxes and
respond by selecting or ticking the right box.
10-Transcoding
The last type of objective test is called transcoding. Usually a word with the first
part as trans means there’s a sense of changing, a change is taking place from one
thing to another, one shape to another, one medium to another. Say, for
example, transformation, transportation, translation, or, in our case, transcoding.
So transcoding means we transform or transfer a certain medium to another, a
certain form of written material to another. So, for instance, I give my students a
description of a classroom, I give them a paragraph describing whatever a
classroom containing, say furniture. Then, below this paragraph, I draw a
classroom picture and I ask the students to transcode, that is to transfer, the
information in the paragraph into the drawing, into the picture, this is called
transcoding. So, they transcode the written material to a drawing, showing the
parts of the classroom, or say the parts of an animal, or a plant, etc. The level is
for high level students.
Subjective testing techniques:
1. Composition writing
2. Letter writing
3. Essay writing
4. Precis writing
Returning to subjective testing techniques, we have four, all of which are written
forms. Here we ask the students to write not to recognize, to produce, for
example, to write a composition, a letter, an essay, or to summarize.
There are two types of composition, restricted (guided) and free composition.
We studied, when we were freshmen, guided composition, that is to say
restricted. Guided means that there is somebody who is guiding you in writing the
composition, you are not free to write whatever you want. You are restricted by
the number of the words, the topic that you are supposed to write on, the style
that you should use in your writing, whether argumentative, descriptive, or
narrative.
As sophomore students, we studied free composition, from the name, you're free
now, you're not guided anymore, free to express whatever you believe in writing
on a certain topic.
Now, in such subjective questions, the teacher is supposed to give two or more
topics to the students, because if I give only one topic maybe the students don’t
know how to express their opinions concerning this topic, or may not understand
it in the first place. Also, if it is in the guided form, the teacher should restrict the
number or limit the number of words.
A teacher may say, write a composition or an essay on one of the following topics
(and you mentioned the topics) your composition should not exceed the limit of
300 words, so it’s restricted.
 In letter writing, also a certain format is usually given to students and they
are asked to fill in the gaps, in, for instance, a letter, giving certain phrases.
Or they write essays concerning a certain topic or subject matter.
 precis is a kind of written piece in which students are asked to summarize a
certain text. So, in all these we test the students ability to write, the writing
skill is under examination. They are not reliable, because the scoring
procedure is subjective. They may have content validity to a certain
degree, if the content of the topics reflects the content of the syllabus.
Semi-objective testing techniques:

in semi objective questions, they not only ask students’ recognition, we expect
students to produce language, so here answers are productive.
1-Cloze test
A cloze test was based originally on the Gestalt theory of closure, the philosophy
behind it is that we have a system and this system includes gaps, these gaps
should be filled with something or should be closed. We can make cloze tests
more objective by giving the answers in the form of choices, so students only
need to select the right choice and put it in blank.
Now, a cloze test usually comes in the following form; a teacher gives the
students a passage, a reading passage, he leaves the first sentence, or the first
two sentences complete, without any deletion, then he starts deleting regularly,
from the third sentence to the end of the passage. Then, we ask the students to
supply the deleted words.
What do I mean by regularly?
It means, for instance, every third word is going to be deleted or every fifth word
is going to be deleted.
 What’s the difference between close-ended tests and cloze tests:
Closed-ended questions (questions that have only one correct answer) found in
objective tests. In colze tests, we have a passage having blanks or gaps and
students should fill them in.
2-Short answer questions.

Short answer tests, from the name, it requires from the student to give short
answers, very short answers, no comments or compositions, only short answers.
3-Completion.
completion usually comes in the form of an incomplete statement that need to be
completed, the blanks need to be filled with suitable information.
4-Transformation.
It is usually related to grammatical structures, so we ask the students, for
instance. For instance, we give them a paragraph that is written in the present
tense, and we ask the. to transform it into the past tense. We say, transform
the following paragraph from the present simple tense to the past tense, this is
transformation.
5-Gap-filling.
Gap-filling is different in the case of semi objective tests, in comparison to
when it is objective. In the case of objective tests, gap-filling, we supply the
student with the choices, they choose the suitable word and they fill the gap
with it, or they fill the blank. However, in the case of semi objectives, students
are supposed to provide the right answer. We do not provide them with the
right answer.

Assessment

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assessment

Uploaded by

Copyright:

Available Formats

University of Basra

College of Education for Human Sciences

Characteristics of a Good Language Test

As teachers we have to continuously assess our students, and this continuous

Sometimes, we notice that tests have certain shortcomings, defects, and

1-Validity; The validity of an assessment tool ( a test) is the extent to which it

A-Face validity: Do the test items appear to be appropriate?

2. Reliability: Is the degree to which an assessment tool (a test) produces stable

The requisites of a reliable/ dependable test are as follows:

How to check reliability?

3. Practicality: It means test usability.

There are some other features of a good test.

Balance; It should be balanced, and here we have a sort of making a balance

Further, the test should be objective and economical.

 All of these characteristics are important, with different degrees.

Remember a test is a tool of assessment which is intended to measure students’

Questions are of two general types:

 Reliability: Closed-ended questions usually have high reliability. While

Objective testing technique:

1-Multiple-choice items test (MCQ)

This second part is composed of a number of distracters, options, alternatives, or

What is the format? What do I say? How do I ask my students in

Procedures to be followed when constructing a good MCQ items:

 Do not give hints or grammatical cues or hints to the correct answer,

For example, I had _____ in my breakfast.

2-True/false items test

What is the form?

We should do the following to construct good True/False items:

3-Matching items tests:

This is a frequently used types of objective tests.

 The wording of items in column A should be longer shorter than those in

 This technique, matching, is objective as well as reliable. It also has

Semi-objective testing techniques:

2-Short answer questions.

You might also like