Unit 1

I.
LANGUAGE ASSESSMENT
A. INTRODUCTION
1. Teaching and Testing
Many language teachers harbour a deep mistrust of tests and testers

since it is undeniable that a great deal of language testing is of poor quality.
Language tests oftentimes have a harmful effect on teaching and learning and
fail to measure what it is they intend to measure.
2. Backwash
WHAT IS A BACKWASH?
Basically, backwash refers to the effect of testing on teaching and learning.

These may be either harmful or beneficial.
Harmful backwash, as the name suggests, are the negative effects of testing on
the teaching and learning activities.
Examples:
• If a test is regarded as important, if the stakes are high, preparation for it can
dominate the teaching and learning activities
• Test content and testing are not appropriate to measure the intended learning
outcomes
• Students developing negative attitudes toward tests
Sometimes, however, backwash can be beneficial, thus giving positive impacts on

teaching and learning.
Examples:
• It helps in monitoring students performance
• It gives teachers the chance to assess their teaching performance to make

room of improvement
• It helps to determine whether to revise the curriculum and teaching styles for
the betterment of teaching and learning activities
Generally, backwash is the impact of assessment on teaching and learning.
1|P age
3. The Needs for Tests
▪ One conclusion drawn from understanding why tests are so mistrusted by
language teachers and how this mistrust is often justified is that this might be
better off without language test.
▪ Teaching is, after all the primary activity; if testing comes in conflict with it, then
it is testing that should go, especially when it has been admitted that so much
testing provides inaccurate information.
▪ Teaching systems need dependable measures of language ability to provide
information about the achievement of groups of learners, without which it is
difficult to see how rational educational decisions can be made.
▪ Even without considering the possibility of bias, we have to recognize the need
for a common yardstick, which test provide, in order to make meaningful
comparisons.
4. Reasons for Testing
a. Finding out about progress
• The type of test to be given will depend very much on the purpose in
testing. One should always one’s self about the real purpose of the test to be
given to the students.
• One major reason is to find out how well the students have mastered the
language areas and skills which have been taught. These tests look back at
what students have achieved and are called progress tests, the most
important kinds of tests for a teacher.
• Progress tests should produce a cluster of high marks for it is expected. But
if most students fail, something must have been wrong with the teaching,
the syllabus or the materials.
• It also acts as a safeguard against hurrying on to complete a syllabus or
textbook regardless of what the students are actually achieving--or failing to
achieve.
• A teacher has to avoid over-testing although one should try to give progress
tests regularly
• The best progress test is one which students do not recognize as a test but
see as simply an enjoyable and meaningful activity.
b. Encouraging students
• This is one important function of a teacher-made test, to encourage
students.
2|P age
• In learning a foreign language, and it is often very difficult indeed for us to
judge our own progress.
• A classroom test can help to show students the progress which they are
undoubtedly making. It can serve to show them each set of goals which they
have reached on their way to fluency.
c. Finding out about learning difficulties
• In teaching, sometimes we concentrate on following the syllabus and ignore
the needs of our students leading to failure.
• . A good diagnostic test helps us to check our students’ progress for specific
weaknesses and problems they may have encountered. One must be
systematic when designing the test and must select areas where we think
there are likely to be problems or weaknesses.
• Usually a diagnostic test forms part of another type of test especially a
classroom progress test. As such, it is useful to regard diagnostic testing as
an ongoing part of the teaching and testing process.
• When marking a diagnostic test, one should try to identify and group
together a student’s marks on particular areas of language
• Diagnostic tests of all kinds are essential if we wish to evaluate our teaching.
One can also evaluate the syllabus, the course book and the materials used.
• Whatever the reason, a classroom test can enable teachers to locate
difficulties and to plan appropriate remedial teaching.
d. Finding out about achievement
• Achievement test is also like a progress test but it is usually designed to
cover a longer period of learning than a progress test.
• Achievement tests should attempt to cover as much of the syllabus, the
contents of the test will not reflect all that has been learned.
• A test of achievement measures a student’s mastery of what should have
been taught.
• It is concerned with covering a sample which accurately represents the
contents of a syllabus or a course book.
• In setting an achievement it is necessary to base your test on a syllabus
instead of simply what you taught to maintain a certain constant standard.
• In order to resolve problems regarding topics to cover, it is necessary to
attain cooperation with colleagues for it will accommodate different
3|P age
perspectives and summarize fields to cover.
e. Placing Students
• A placement test enables us to sort students into groups according to their
language ability at the beginning of a course.
• Questions measuring general language ability can form a useful part of a
placement test.
• The most important part of the test should consist of questions directly
concerned with the specific language skills.
• A placement test should try to spread out the student’s scores as much as
possible. In this way, it’s possible to divide students into several groups
according to their various ability levels.
Selecting Students
• The purpose of this test is to compare the performances of all the candidates
and select only the best. Often refer to a selection test as being norm-
referenced.
• A good selection test will usually spread out students’ scores over most of
the scale that one is using (0%-100 %).
• A norm-referenced test is used to show how a student’s performance
compares with the performance of other students in the same group.
• Selection test are rarely set by the class teacher. They are usually set by
outside examining bodies that have washback effect (refers to the way an
exam or test influences teaching and learning in the classroom.) Teachers
will gear closely to the exam if it is good then, it is useful. If it's bad, then it
has a damaging effect on teaching
f. Finding out about proficiency
• Proficiency tests are used to measure how suitable candidates will be
performing certain task or following a specific course.
• This test has different parts which candidates can choose to do according to
their different purposes.
• In designing a proficiency test, one should pay careful attention to those
language areas and skills which the candidate will need.
• The main concern is to find out only the degree of success of someone
rather than comparing the abilities of the various candidates.
4|P age
• A criterion-referenced test is used to find out whether a student can perform
a particular task or not.
B. KINDS OF TESTS AND TESTING

The four types of test are proficiency tests, achievement tests, diagnostic tests and
placement tests. This categorization will prove useful both in deciding whether an
existing test is suitable for a particular purpose and in writing appropriate new tests
where these are necessary.
1. Proficiency Test
o ‘proficient’ means having sufficient command of the language for the particular
purpose.
o designed to measure people’s ability in a language regardless of any training
they may had in that particular language.
o it is based on the specification of what the candidates’ have to be able to do in
language in order to be considered proficient.
Example: a test designed to discover whether someone can function successfully
as UN translator
But there are some proficiency test that are more general,
Example: the Cambridge Certificate of Proficiency in English,
▪ This test functions to show whether the candidates have reached a certain
standard with respect to a set of specified objectives.
▪ This proficiency test should have detailed specifications saying just what it is the
candidates have demonstrated that they can do.
▪ Despite of differences between the content and level of difficulty, all proficiency
tests have in common, they are not based on courses that candidates may
previously taken.
2. Achievement Tests
In contrast to proficiency tests, achievement tests are directly related to language

courses, their purpose being to establish how successful individual students, groups of
students, or the courses themselves have been achieving the objectives
5|P age
Two Kinds of Achievement Tests
a) Final achievement tests

b) Progress achievement tests
Final achievement tests are those administered at the end of a course of study.
• They may be written and administered by ministries of education, official examining

boards or by members of the teaching institutions.
• In the view of some testers, the content of a final achievement test should be based
directly on a detailed course syllabus or on the books and other materials used. This
has been referred to as the “syllabus-content approach.” It has an obvious appeal,
since the test only contains what it is thought that the students have actually
encountered, and thus can be considered in this respect at least, a fair test.
❖ Disadvantages of Achievement Tests

• If the syllabus is badly designed or the books and other materials are badly chosen,
the results of a test can be very misleading.
• Successful performance on the test may not truly indicate successful achievement of
course objectives.
Examples:
• A course may aim to develop a reading ability in German, but the test may limit itself
to the vocabulary that the students are known to have met.
• A course intended to prepare students for a university study in English, but the
syllabus (and so the course and the test) may not include listening (with note taking)
to English delivered in lecture style on topics of the kind that the students will have
to deal with at a university.
Test results will fail to show what the students have achieved in terms of course
objectives. The alternative approach is to base the test content directly on the objectives
of the course.
❖ This has a number of advantages:
1. It compels course designers to be explicit about objectives.

2. It makes it possible for performance on the test to show just how far students have
achieved those objectives.
6|P age
This in turn puts pressure on those responsible for the syllabus and for the selection of
books and materials to ensure that these are consistent with the course objectives.
Now it might be argued that basing the test content on objectives

rather than on course content is unfair to students. If the course content does
not fit well with objectives, they will be expected to do things for which they
have not been prepared. In a sense this is true. But in another sense it is not.
If a test is based on the content of a poor or inappropriate course, the
students taking it will be misled as to the extent of achievement and the
quality of the course.
Progress Achievement Tests are tests that are intended to measure the progress
that the students are making.
• One alternative way of measuring progress would be establishing a series of well-

defined short-term objectives.
• These should make a clear progression toward the final achievement test based on
course objectives.
• Teachers should feel to set their own “pop quizzes.” These will serve both to make a
rough check on students’ progress and to keep students on their toes.
• Since such tests will not form part of formal assessment procedures, their
construction and scoring need not to be too rigorous.
• Nevertheless, they should be seen as measuring progress toward the intermediate
objectives on which the more formal achievement tests are based.
• They can, however, reflect the particular ‘route’ that an individual teacher is taking
towards the achievement of objectives.
3. Diagnostic Assessment
• It identifies weaknesses, strengths, and problems of students’ learning.
• Diagnostic assessment can be the teacher’s basis of planning of what to do next
in the teaching and learning process.
• The teacher will be able to design classroom activities that address their actual
learning needs if he knows students’ strengths and weaknesses.
7|P age
• They are intended primarily to ascertain what learning still needs to take place.
At the level of broad language skills this is reasonably straightforward. We can
be fairly confident of our ability to create tests that will tell us that someone is
particularly weak in, say, speaking opposed to reading in a language.
• We may be able to go further, and analyze samples of a person’s performance in
writing or speaking in order to create profiles of the student’s ability with respect
to such categories as ‘grammatical accuracy’ or ‘linguistic appropriacy’. Indeed
raters of writing and oral test performance should provide feedback to the test
takers as a matter of course.
But it is not easy to obtain a detailed analysis of a student’s command of grammatical

structures—something that would tell us, for example, whether she or he had mastered the
present perfect/past tense distinction in English. In order to be sure of this, we would need a
number of examples of the choice the student made between the two structures in every
different context that we thought was significantly different and important enough to warrant
obtaining information on.
The lack of good diagnostic tests is unfortunate. They could be extremely useful for
individualized instruction or self-instruction. Learners would be shown where gaps exist in their
command of the language, and could be directed to sources of information, exemplification and
practice.
Well-written computer programs will ensure that the learner spends no more time than
is absolutely necessary to obtain the desired information, and without the need for a test
administrator. Whether or not they become generally available will depend on the willingness of
individuals to write them and of publishers to distribute them.
Placement Tests
• A test usually given to a student entering an educational institution to determine

specific knowledge or proficiency in various subjects for the purpose of
assignment to appropriate courses or classes.
• Are intended to provide information that will help to place students at the stage
of the teaching programme most appropriate to their abilities. Typically they are
used to assign students to assign students to classes at different levels.
• Placement tests can be bought, but this is to be recommended only when the
institution concerned is sure that the test being considered suits its particular
8|P age
teaching programme. One possible exception is placement tests designed for use
by language schools, where the similarity of popular text books used in them
means that the schools’ teaching programme also tend to resemble each other.
• The placement tests that are most successful are those constructed for particular
situations. They depend on the identification of the key features at different
levels of teaching in the institution. They are tailor-made rather than bought off
the peg. This usually means that they have been produced ‘in house’. The work
that goes into their construction is rewarded by the saving in time and effort
through accurate placement.
C. APPROACHES TO TEST CONSTRUCTION

1. Direct vs. Indirect Testing
Direct Testing
Testing is said to be direct when it requires the student to perform
precisely the skill we wish to measure.
Examples:
• If we want them to know how well students can write composition,

we get them to write compositions.
• If we want then to know how well they speak, we get them to speak.
The tasks, and the texts that are used, should be authentic as possible.
Every effort is made to make them as realistic as possible.
Direct testing has a number of attractions.

1.) It is relatively straightforward to create the conditions which will elicit
the behaviour on which to base our judgments.
2.) The assessment and interpretation of students’ performance is also
quite straightforward.
3.) There is likely a helpful backwash effect.
Indirect Testing
Indirect testing attempts to measure the abilities that underlie the skills in
which we are interested.
Examples:
9|P age
• Developing an indirect measure of writing ability. It contains items
of the following kind where the student has to identify which of
the following underlined elements s erroneous or inappropriate in
formal standard English.
• Testing pronunciation ability by a paper and pencil test in which
student has to identify pairs of words which rhyme each other.
▪ The main appeal of indirect testing is that it seems to offer the possibility of testing a
representative sample of finite number of abilities which underlie a potentially indefinite
large number of manifestations of them.
▪ The main problem with indirect tests is that the relationship between performance on
them and performance of the skills in which we are usually more interested tends to be
rather weak in strength and uncertain in nature.
▪ Some tests are referred to as semi-direct.
Example:
Speaking tests where students respond to a tape-recorded stimuli, with their own
responses being recorded and later scored. These test are semi- direct in the sense that,
although not direct, they stimulate direct testing.
2. Discrete-Point vs. Integrative Testing

Discrete point testing
▪ refers to the testing of one element at a time, item by item. This
might, for example, take the form of series of items, each testing a
particular grammatical structure.
Integrative testing
▪ requires the candidate to combine many language elements in the
completion of a task. This might involve writing a composition,
making notes while listening to a lecture, taking a discussion, or
completing a cloze passage
Discrete point tests will almost always be indirect, while integrative test will
tend to be direct. However, some integrative testing methods, such as cloze
procedure, are indirect.
10 | P a g e
3. Norm-Referenced vs. Criterion Referenced Testing
Norm-Referenced Testing
• it is a test that indicates how a pupil’s performance compares to that of other

pupils. (Santos, R 2007)
Criterion-Referenced Testing
• it is a tests that indicates how a pupil’s performance compares to an established

standard as criterion thought to indicate mastery of a skill. (Santos, R 2007)
Differences between Norm Referenced Tests and Criterion Referenced Tests
Norm Referenced Tests
1. Norm-referenced Tests are used to determine the achievements of individuals in

comparison with the achievements of other individuals who take the same test.
2. In norm-referenced test, the quality of achievement of a student is determined by the
distance of his score from the mean or median.
3. Norm- referenced test are designed to produce variability among individuals. To achieve
this, some easy items are included in the test. Variability among the scores reflects good
measurement while homogeneity indicates poor measurement to some extent.
4. Norm-referenced test are used for selection and grouping purposes.
5. Norm-Referenced Tests, on discriminating items such as items that are easy, too difficult
or are ambiguous are removed or improved. Hence, sampling of test items is allowed
and utilized.
6. In Norm-Referenced Tests relative placement indices are used to describe the relative
placement of scores. Such indices are absolute ranks, quartile, means, median, and the
like.
7. In Norm Referenced Tests learners may be allowed to tackle a higher level of learning
task although they have not mastered very well the preceding learning task.
Criterion Referenced Test
1. Criterion Referenced Tests are used to determine the achievements of individuals in

comparison with a criterion usually an absolute standard.
11 | P a g e
2. In Criterion Referenced Test, the quality of achievement of a student is determined by
the distance of his score from the criterion established.
3. In criterion referenced tests, the variability is irrelevant.
4. Criterion referenced test are used to determine the level or skill or knowledge of
individuals if they are capable or qualified to apply such skill or knowledge.
5. In criterion referenced tests, too easy or too difficult items are not removed, rather they
should be included if they truly reflect being measured.
6. In criterion referenced tests, an individual’s scores simply above or below standard or
criterion.
7. In criterion referenced tests, a pupil is not supposed to tackle a higher learning tasks if
has not passed the standard set for preceding learning task.
4. Objective Testing vs. Subjective Testing

Objective Tests – If no judgment is required on the part of the scorer,
then the scoring is objective. These are tests that require one and only one
possible answer. The scoring of this type of test is easy because there is one-
to-one correspondence of examinees’ answers with what is specified in the
key answers. The objectivity of scoring means that when one rater checks
the paper today and another will check the same set of papers tomorrow, the
scores will always be the same. To elaborate further, no matter who checks
the paper at different times and settings, similar results are gathered.
Subjective Test – If judgment is called for, the scoring is said to be

subjective. There are different degrees of subjectivity in testing. The
impressionistic scoring of a composition may be considered more subjective
than the scoring of short answers in response to questions on a reading
passage. These are tests that require a tedious scoring task (e.g., essay
test). The difficulty of scoring makes it possible that one checker may rate
one piece of work differently from that of the other. This is a challenge in the
use of subjective tests. To objectivize the quantification of subjective tests,
the teacher should develop rubrics for scoring.
12 | P a g e
5. Computer adaptive testing
Computerized adaptive testing (CAT, sometimes called computer-adaptive
testing) are designed to adjust their level of difficulty—based on the
responses provided—to match the knowledge and ability of a test taker.
In most paper and pencil tests, the candidate is presented with all the items,
usually in ascending order of difficulty, and is required to respond to as many
of them as possible. This is not the most economical way of collecting
information on someone’s ability. People of high ability (in relation to the test
as a whole) will spend time responding to items that are very easy for them –
all, or nearly all, of which they will get correct. We would have been able to
predict their performance on these items from their correct response to more
difficult items. Similarly, we could predict the performance of people of low
ability on difficult items, simply by seeing their consistently incorrect
response to easy items. There is no real need for strong candidates to
attempt easy items, and no need for weak candidates to attempt difficult
items.
Computer adaptive testing offers a potentially more efficient way of collecting

information on people’s ability. All candidates are presented initially with an
item of average difficulty. Those who respond correctly are presented with a
more difficult item; those who respond incorrectly are presented with an
easier item. The computer goes on in this way to present individual
candidates with items that are appropriate for their apparent level of ability
(as estimated by their performance on previous items), raising or lowering
the level of difficulty until a dependable estimate of their ability is achieved.
The oral interviews are typically a form of adaptive testing, with the
interviewer’s prompts and language being adapted to the apparent level of
the candidate.
13 | P a g e

Unit 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 1

Uploaded by

Copyright:

Available Formats

I.

Many language teachers harbour a deep mistrust of tests and testers

Basically, backwash refers to the effect of testing on teaching and learning.

• Students developing negative attitudes toward tests

Sometimes, however, backwash can be beneficial, thus giving positive impacts on

• It helps in monitoring students performance

• It gives teachers the chance to assess their teaching performance to make

Generally, backwash is the impact of assessment on teaching and learning.

B. KINDS OF TESTS AND TESTING

In contrast to proficiency tests, achievement tests are directly related to language

a) Final achievement tests

• They may be written and administered by ministries of education, official examining

❖ Disadvantages of Achievement Tests

❖ This has a number of advantages:

1. It compels course designers to be explicit about objectives.

Now it might be argued that basing the test content on objectives

• One alternative way of measuring progress would be establishing a series of well-

But it is not easy to obtain a detailed analysis of a student’s command of grammatical

• A test usually given to a student entering an educational institution to determine

C. APPROACHES TO TEST CONSTRUCTION

• If we want them to know how well students can write composition,

Direct testing has a number of attractions.

2. Discrete-Point vs. Integrative Testing

• it is a test that indicates how a pupil’s performance compares to that of other

• it is a tests that indicates how a pupil’s performance compares to an established

Differences between Norm Referenced Tests and Criterion Referenced Tests

Norm Referenced Tests

1. Norm-referenced Tests are used to determine the achievements of individuals in

Criterion Referenced Test

1. Criterion Referenced Tests are used to determine the achievements of individuals in

4. Objective Testing vs. Subjective Testing

Subjective Test – If judgment is called for, the scoring is said to be

Computer adaptive testing offers a potentially more efficient way of collecting

You might also like