Professional Documents
Culture Documents
1
XX INTRODUCTION
It is important to fully understand the role and purpose of testing and evaluation
before we can discuss different ways of testing in TESL. In this topic, the roles
and purposes of testing and evaluation in TESL are discussed. This includes
a discussion on the difference between various terminology related to basic
concepts in testing; basic constituent parts of a test; as well as the role of tests in
the instructional and educational process including decisions that are made on
the basis of test scores.
These terms are obviously related. However, what do the terms mean and how
are they inter connected? Before we proceed further into the subject of testing,
it is appropriate that we first understand several basic yet important terms.
Perhaps the most important of these would be the terms tests, assessment, and
measurement. Let us first look at the definitions of these three terms.
(a) Test
• A test can be defined as a systematic procedure for measuring a sample
of behaviour by posing a set of questions in a unified manner (Linn
& Gronlund,1995:6). The key phrases in this definition are systematic
procedure, measuring a sample of behaviour, and a set of questions in a unified
manner.
• A test is a systematic procedure because there is a planned format in
tests. A test cannot be haphazard as a haphazard test would lose much
of its credibility as a test.
• A test also measures a sample behaviour. In the case of language tests,
the sample behaviour would be language proficiency or any language
related construct we are interested in.
• Finally, questions or items in a test are seen to be unified. A traditional
view of test items is that they work in the same way by measuring the
same construct. If items in a test are not unified and measure different
constructs, what then does the test measure?
(b) Assessment
• Assessment is any of a variety of
procedures used to obtain information
on students’ performance. Unlike a test,
an assessment is seldom exclusively
quantitative.
• A teacher may assess student learning
by simply looking at how students
respond to instruction.
• Students’ facial expression can provide valuable information useful in
assessment.
• A test is an assessment although as mentioned here, not all assessments
need to be tests.
• It should also be noted that the term evaluation can be considered
synonymous with assessment although some would limit its use to
programme evaluation and not the evaluation of student performance.
For the sake of brevity, I will consider both terms as synonymous.
(c) Measurement
Measurement is a numerical description of a
particular characteristic. We measure physical
objects in terms of their height, weight, and
depth. We can measure distance as well as length.
However, tests tend to measure behavioural and
cognitive aspects which are a lot more abstract
than physical objects. Nevertheless, all tests are
measurements. We have seen, however, that not all
measurements are tests.
From Figure 1.1, we can conclude that all tests are measurements. Similarly,
tests can be assessments as well.
In the next few topics we will come across many different types of tests and
assessments. We will also examine measurements commonly used in tests.
activity 1.1
Share with your friends some tests that you find good or bad. What are the
features of a good test?
There are a number of ways how we can look at a test. We may want to examine
characteristics of a good test and the issues of validity and reliability. These
issues, however, will be discussed in Topic 7 of this module. Here, it may be more
important to look at the basic structure of a test. If we were to dissect a test and
examine its anatomy, how would it look like? Wesche (1983) suggested four major
parts to a test. These four parts of a test form a useful framework for examining
any kind of test:
• Stimulus material.
• Task posed to the learner.
• Learner’s response.
• Scoring criteria.
the word level, the sentence level, and the discourse level. The reading for
understanding process also involves other abilities and knowledge such
as inferencing and cultural knowledge. Similarly, this component will also
address how learners are expected to mentally and cognitively react to
the format of the question and what skills, sub-skills and abilities they are
required to draw on in order to complete the task.
activity 1.2
So what exactly is the relationship between testing and teaching? Perhaps we can
try to get an initial idea with the help of the simple diagram in Figure 1.2.
The model suggested by Figure 1.2, however, is clearly a simplified and idealised
one. Such a model may work well if all three components are under the purview
of a single person or small group of people. However, when it is applied to a
national scenario, the linear process is not so easy and likely anymore. Some
of the objectives of the curriculum specifications may be lost in instruction
especially as those who carry out the teaching may not be directly involved in
curriculum planning. Similarly, national standardised tests or examinations may
also fail to capture the emphases placed during instruction as test constructors
in these examinations are not those who had actually carried out the teaching.
Nevertheless, for want of a conceptual idea of the position of testing in
instruction, this simple model in Figure 1.2 would suffice for the moment. We
will revisit the model in latter topics when we hopefully have a clearer and more
comprehensive understanding of tests and instruction.
It should be noted here that the nature of tests is affected by the nature or
approach of instruction. We need to only look at language testing history to see
the truth of this statement. It was once described to me that language testing had
undergone three major historical shifts or phases.
• The first phase, the pre scientific phase coincides with a time when teachers
were thought to be competent in constructing tests simply by virtue of being
teachers. It was felt that if they could teach, then they could test.
• A more ‘scientific’ era heralded by behaviorism and audiolingualism saw
the notion of psychometric structuralism where measurement of structural
knowledge of language was given top priority.
• Finally, language tests were influenced by the communicative approach
movement and a sociolinguistic integrative perspective in testing was
adopted.
Each of the three phases, of course, coincided with theories of and approaches
to language learning and teaching of the time. This further reinforces the notion
that there is a close relationship between teaching and testing.
Bloom’s taxonomy focuses on cognitive abilities and may have limitations when
used in language teaching and learning. Other taxonomies, such as Barrett’s
taxonomy have been developed for more language related skills. This taxonomy
consists of four levels: literal recognition or recall; inference; evaluation; and
appreciation. Each level consists of several sub levels. Barrett’s taxonomy
focuses on reading and is especially relevant for language teaching and learning.
However, what needed are also taxonomies of the productive language skills of
writing and speaking. In second language situations, such taxonomies would be
useful in charting out progress in learning as well as specifying a comprehensive
teaching plan.
activity 1.3
The better students also use tests as a source of information. Feedback from test
scores inform students of their strengths and weaknesses, whether their study
approach has been beneficial, and if they have understood the material taught.
In other words, information in the form of test results is equally important for
the student as it is for the teacher. As such, it should be a general practice to
return test papers as often and as quickly as possible. A different way of looking
at things is that teachers are now presented with a new responsibility – i.e. to
develop in their students the ability and self directedness to use information from
such sources as test results to learn and plan their own learning.
Kubiszyn & Borich (2000), mention eight different types of decisions made on the
basis of information obtained from tests. These educational decisions are shown
in Figure 1.3.
The first three decisions are often within the domain of the classroom teacher. He
or she can make decisions with respect to instruction, grading as well as diagnostic
activities.
Instructional decisions are made based on test results when, for example, teachers
decide to change or maintain their instructional approach. If a teacher finds
out that most of his class have failed his test, there are many possible reactions
he can have. First, he could be very disappointed, blame the students for not
studying and punish them in some way. Of course, this is not a wise decision to
make. Instead, the teacher could evaluate the effectiveness of his own teaching
or instructional approach. An instructional decision is made when the teacher
decides upon the approach currently used. Perhaps the teacher may decide that
the approach is not suitable and a different approach should be used.
Tests yield scores and teachers will have to make decisions in terms of the kind of
grades to give students. As grades are indicators of student performance, teachers
need to decide whether a student deserves a high grade – perhaps an A – on the
basis of some form of assessment. Traditionally, and perhaps for a long time to
come, this assessment will be in the form of tests.
Sometimes, we give tests to find out the strengths and weaknesses of our
students. Can they correctly construct a passive sentence? Do they use the
different pronoun forms correctly? These kinds of questions can be answered
by observing student performance on tests. When a teacher decides that he
will spend more time teaching passive sentences because student performance
on such sentences in a test was unsatisfactory, then he has made a diagnostic
decision.
Counselling and guidance decisions are also made by relevant parties such
as counsellors and administrators on the basis of exam results. Counsellors
often give advice in terms of appropriate vocations for some of their students.
These advice is likely to be made on the basis of the students’ own test scores.
Programme or curriculum decisions reflect the kinds of changes made to the
educational programme or curriculum based on examination results. Finally,
there are also administrative policy decisions that need to be made which are also
greatly influenced by test scores.
activity 1.4
What do you think are students’ reaction towards tests? Do they enjoy or
fear tests? Discuss with your coursemates.
The framework of a test is reflected in the way the test is constructed. The
first stage in constructing a test is to determine what is to be tested. This is not
as easy as it seems because it requires determining the theoretical construct
of what is to be tested. For example, let’s assume that we are interested in
testing communicative competence. This requires that a theoretical construct
of communicative competence be first determined. Various theories of
communicative competence have been suggested (c.f. Bachman, 1990; Canale
& Swain, 1980). We need to examine these theories and determine what
communicative competence is to us for the purpose of our test.
The steps described above provide a general description of the test construction
process. In actual practice, there may be some additional steps that need to be
taken. Sometime back, I was asked to construct a test of English language
proficiency for a private company. When I set out to do the task, I listed down the
steps that I probably had to take. One of the first steps I felt necessary was some
form of needs analysis in order to determine what kind of language should be
tested. I wanted to find out from the management what sort of test they wanted
and whether what I had in mind fit their requirements. My intention was to draft
the test, show the draft to the management for approval, pilot it and later validate
the test in some way.
I would also imagine that if I were teaching in the public schools, I would
probably not spend so much time on the three steps described earlier – theoretical
construct, operationalisation, and quantification – because the test construction
process has largely been determined by the Ministry of Education. The national
standardised Sijil Peperiksaan Malaysia is already an embodiment of the three
stages and teachers merely need to follow the model examination paper with
respect to these three elements. However, it may be helpful to construct a test
blueprint in order to ensure that my test spans the necessary content and that
there is a variety of skill or abilities being tested.
Comprehension
Application
Knowledge
Evaluation
Synthesis
Analysis
Total
Section A 1, 3, 2, 4, 5 8 6 7, 10 9 10
Comprehension
Section B 12, 16, 11, 14, 17 13 15, – 10
Grammar 18, 19 20
Section C 21, 23 22, 29 24, 25, 27, 28 30 10
Functions 26
Total 8 8 5 3 4 2 30
There are numerous ways of forming test blueprints, some more comprehensive
than others (see Nitko, 2001 for several examples), but an important point to
remember is that the test blueprint should be used only as a tool rather than to
“promote exact or rigourous classification” (Nitko, 2001: 113). Nevertheless, the most
common form of test blueprints in schools in Malaysia has incorporated Bloom’s
taxonomy as its primary method of classification. In the example in Table 1.2, the
30 items in the test are categorised according to Bloom’s taxonomy. The numbers
1 to 30 in the blueprint refer to the test item numbers. Items number 1 and 3,
for example, are comprehension items which test knowledge, while item 8 tests
application. A blueprint such as this is useful in ensuring that different kinds of
questions are asked. In this particular example, most questions are knowledge
and comprehension type questions (8 each) which tends to be quite common.
However, all six question types are quite well represented and as such, the test
itself can be considered acceptable.
activity 1.5
(a) What do the terms tests, assessments and measurements mean and
how are they interconnected ?
(b) What constitutes a good test ? What are the 4 major parts of a test as
suggested by Wesche (1983)?
SUMMARY
• This topic has presented a discussion on various basic issues dealing with
tests and measurements.
• It has looked at terminology related to tests and measurements and
attempted to distinguish between terms which are similar.
• It has also attempted to situate testing within the instructional process,
taking into consideration instructional objectives as well as decisions.
Assessment Measurement
Evaluation Test