05 HBET4503 Topic 1

TOPIC 1 THE ROLE AND PURPOSE OF TESTING AND EVALUATION IN TESL
 1
Topic  The Role and

Purpose of
1 Testing and
Evaluation
In TESL
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Distinguish between tests, assessments and measurements;
2. Describe the basic parts of a test or evaluation; and
3. Describe the role of tests in the instructional process.
XX INTRODUCTION
It is important to fully understand the role and purpose of testing and evaluation
before we can discuss different ways of testing in TESL. In this topic, the roles
and purposes of testing and evaluation in TESL are discussed. This includes
a discussion on the difference between various terminology related to basic
concepts in testing; basic constituent parts of a test; as well as the role of tests in
the instructional and educational process including decisions that are made on
the basis of test scores.
1.1 WHAT ARE TESTS, ASSESSMENTS

AND MEASUREMENTS?
A course on testing may be called Testing and Measurement at one institution,
Testing and Evaluation at another or even simply Assessment at a third institution.
Copyright © Open University Malaysia (OUM)
2  TOPIC 1 THE ROLE AND PURPOSE OF TESTING AND EVALUATION IN TESL
These terms are obviously related. However, what do the terms mean and how
are they inter connected? Before we proceed further into the subject of testing,
it is appropriate that we first understand several basic yet important terms.
Perhaps the most important of these would be the terms tests, assessment, and
measurement. Let us first look at the definitions of these three terms.
(a) Test
• A test can be defined as a systematic procedure for measuring a sample
of behaviour by posing a set of questions in a unified manner (Linn
& Gronlund,1995:6). The key phrases in this definition are systematic
procedure, measuring a sample of behaviour, and a set of questions in a unified
manner.
• A test is a systematic procedure because there is a planned format in
tests. A test cannot be haphazard as a haphazard test would lose much
of its credibility as a test.
• A test also measures a sample behaviour. In the case of language tests,
the sample behaviour would be language proficiency or any language
related construct we are interested in.
• Finally, questions or items in a test are seen to be unified. A traditional
view of test items is that they work in the same way by measuring the
same construct. If items in a test are not unified and measure different
constructs, what then does the test measure?
(b) Assessment
• Assessment is any of a variety of
procedures used to obtain information
on students’ performance. Unlike a test,
an assessment is seldom exclusively
quantitative.
• A teacher may assess student learning
by simply looking at how students
respond to instruction.
• Students’ facial expression can provide valuable information useful in
assessment.
• A test is an assessment although as mentioned here, not all assessments
need to be tests.
• It should also be noted that the term evaluation can be considered
synonymous with assessment although some would limit its use to
programme evaluation and not the evaluation of student performance.
For the sake of brevity, I will consider both terms as synonymous.

TOPIC 1 THE ROLE AND PURPOSE OF TESTING AND EVALUATION IN TESL  3
(c) Measurement
Measurement is a numerical description of a
particular characteristic. We measure physical
objects in terms of their height, weight, and
depth. We can measure distance as well as length.
However, tests tend to measure behavioural and
cognitive aspects which are a lot more abstract
than physical objects. Nevertheless, all tests are
measurements. We have seen, however, that not all
measurements are tests.
The relationship between tests, assessments and measurement can be

illustrated in the following diagram:
Figure 1.1: The relationship between tests, measurement and assessment

(adapted from Bachman, 1990)
From Figure 1.1, we can conclude that all tests are measurements. Similarly,
tests can be assessments as well.
Bachman (1990), considers qualitative assessment of students as an

example of what may fall in area a and teacher ranking of students
performance falling in area b. Area c is represented by tests which are
also assessments and measurements. A clear example of this would
be an achievement test given to students at the end of an instructional
programme. Area d represents tests which are measurements but not
assessments. One such example of this would be research in which
tests are given. Finally, we will find many measurements that are
neither tests nor assessments. The age of students is a measurement
but it is not a test. Neither is it an assessment. These types of
measurements are represented by area e.
In the next few topics we will come across many different types of tests and
assessments. We will also examine measurements commonly used in tests.

1.2 WHAT CONSTITUTES A TEST?
activity 1.1
Share with your friends some tests that you find good or bad. What are the
features of a good test?
There are a number of ways how we can look at a test. We may want to examine
characteristics of a good test and the issues of validity and reliability. These
issues, however, will be discussed in Topic 7 of this module. Here, it may be more
important to look at the basic structure of a test. If we were to dissect a test and
examine its anatomy, how would it look like? Wesche (1983) suggested four major
parts to a test. These four parts of a test form a useful framework for examining
any kind of test:
• Stimulus material.
• Task posed to the learner.
• Learner’s response.
• Scoring criteria.
(a) Stimulus Material

The stimulus material refers to the text or material presented to the learner
or test taker. This can be in many forms. In a reading comprehension test,
for example, the stimulus material could be the reading passage. There
could or could not be pictures or drawings accompanying the passage.
The passage could be on any of many possible content and written in a
particular style. The stimulus material for this example would also include
the questions themselves – after having read the text, would the questions
be in the form of multiple choice objective type questions or are students
required to write a summary?
(b) Task Posed to the Learner

The second component of the framework, the
task posed to the learner, is a somewhat abstract
concept. It involves actually determining the
mental or cognitive response required of the
learner toward the stimulus. If the reading
comprehension passage is again used as an
example, the cognitive response that is required is
for the learner to understand what is read. This
may require comprehension at various levels –
the word level, the sentence level, and the discourse level. The reading for
understanding process also involves other abilities and knowledge such
as inferencing and cultural knowledge. Similarly, this component will also
address how learners are expected to mentally and cognitively react to
the format of the question and what skills, sub-skills and abilities they are
required to draw on in order to complete the task.
(c) Learner’s Response

The learner’s response is the actual
demonstration of the student’s ability within
the limitations of the stimulus provided by the
test. If reading comprehension is the stimulus
and test items are in the form of multiple
choice questions, then the learner’s actual
response is to select the correct answer based
on the question and the options that follow.
If a summary is required, then students will demonstrate their reading
comprehension ability by writing a short summary of the stimulus reading
passage. This component of Wesche’s framework is closely related to the
second component except that it is the actual physical performance of the
task which is seen to demonstrate the ability or behaviour being examined.
(d) Scoring Criteria

Finally, the fourth component of the framework is the scoring criteria. As a
test is a measurement, scoring criteria is an important aspect of the overall
structure of a test. Your test may consist of different types of test items
– perhaps some are multiple choice test items while others require a short
answer. On what basis do you award the number of points for each type
of item? If you decide to award two points for multiple type questions and
five points for short answer type questions, on what basis are you making
such a decision? Proper justification of your decision will require careful
analysis of tasks involved and the weightage that you assign to each item in
a test.

So, once again, the four components of a test according to the Wesche
framework are the stimulus material, the task posed to the learner, the
learner’s response, and the scoring criteria. All four components are highly
inter-related and important in testing. Almost every test can be analysed
according to these four components. More importantly, when we construct a
test, we are actually able to make the test easier or more difficult by varying
each component of this framework.

activity 1.2
The following is a test question taken from an English textbook.

Identify the noun phrases in the following sentences (1 point for every correctly
identified noun phrase):
(a) Do you know where he is?
(b) She didn’t know if the teacher was coming.
(c) The policeman stopped me as I was parking my car.
Use the Wesche (1983), framework to describe this test item.
1.3 A PRELIMINARY UNDERSTANDING OF

TESTS AND THE INSTRUCTIONAL PROCESS
What role do tests have in instruction? It is important for an educator to
understand how tests and instruction are related. Do we test what we teach?
Or should we teach what we test? In an ideal world, these questions may
be moot as the relationship between testing and teaching is seamless as both
serve the purpose of helping students learn. However, in the real world, it is
possible to justify an affirmative answer to each of the two questions. Yes, we
should test what we teach in order to assess the extent to which our students
have understood or perhaps even mastered what has been presented to them.
However, in this world where examinations can play an important role in
determining our future, who can blame teachers who teach what is being tested –
i.e. prepare students only for the test?
So what exactly is the relationship between testing and teaching? Perhaps we can
try to get an initial idea with the help of the simple diagram in Figure 1.2.
Figure 1.2: The relationship between planning, instruction, and testing

In this model, we are reminded that instruction itself is guided by curriculum

planning. Testing represents a final stage of a three stage process beginning
with curriculum planning or instructional objectives, followed by the actual
instruction itself, and finally culminating with testing. The model also suggests a
“washback” from testing to both curriculum specifications and instruction stage.
The concept of washback will be discussed in greater detail later.
The model suggested by Figure 1.2, however, is clearly a simplified and idealised
one. Such a model may work well if all three components are under the purview
of a single person or small group of people. However, when it is applied to a
national scenario, the linear process is not so easy and likely anymore. Some
of the objectives of the curriculum specifications may be lost in instruction
especially as those who carry out the teaching may not be directly involved in
curriculum planning. Similarly, national standardised tests or examinations may
also fail to capture the emphases placed during instruction as test constructors
in these examinations are not those who had actually carried out the teaching.
Nevertheless, for want of a conceptual idea of the position of testing in
instruction, this simple model in Figure 1.2 would suffice for the moment. We
will revisit the model in latter topics when we hopefully have a clearer and more
comprehensive understanding of tests and instruction.
It should be noted here that the nature of tests is affected by the nature or
approach of instruction. We need to only look at language testing history to see
the truth of this statement. It was once described to me that language testing had
undergone three major historical shifts or phases.
• The first phase, the pre scientific phase coincides with a time when teachers
were thought to be competent in constructing tests simply by virtue of being
teachers. It was felt that if they could teach, then they could test.
• A more ‘scientific’ era heralded by behaviorism and audiolingualism saw
the notion of psychometric structuralism where measurement of structural
knowledge of language was given top priority.
• Finally, language tests were influenced by the communicative approach
movement and a sociolinguistic integrative perspective in testing was
adopted.
Each of the three phases, of course, coincided with theories of and approaches
to language learning and teaching of the time. This further reinforces the notion
that there is a close relationship between teaching and testing.
1.3.1 Taxonomies of Instructional Objectives

Perhaps an even more important factor in examining the role of tests in an
instructional process is the instructional objectives. Test items should be based on
instructional objectives especially if we wish to know whether the instruction has

been effective. In this respect, taxonomies such as those suggested by Bloom and
Barrett are useful tools in ensuring the most appropriate questions are asked in
tests.
Table 1.1: Bloom’s Taxonomy and Representative Test Questions
Adapted from: Nitko, 2001: 27
Level Test Question

Knowledge Who are the main characters of the story?
Comprehension What is the main theme of the story?
Application Can the solutions found to the problems in this story be used in
Analysis What literary devices are being used to convey to the reader the
Synthesis Based on this story as well as other stories you have read,
describegeneral strategies that main characters in stories have taken
to overcome the problems that they face.
Evaluation Develop a set of three or four criteria for assessing the quality of a
story and use these criteria to assessany story that we have read.
Bloom’s taxonomy consists of six levels which are generally considered to be

hierarchical. This means that not only are the higher level skills more cognitively
demanding, they also assume the skills that are lower in the taxonomy are also
mastered. The levels of knowledge, comprehension, application, are often
referred to as the lower order skills with knowledge being at the lowest end.
Analysis, synthesis, and evaluation are considered the higher order skills with
evaluation occupying the highest end of the taxonomy. In Table 1.1, each level
of Bloom’s taxonomy is accompanied by a matching question that reflects the
cognitive demands that it places on the students.
Bloom’s taxonomy focuses on cognitive abilities and may have limitations when
used in language teaching and learning. Other taxonomies, such as Barrett’s
taxonomy have been developed for more language related skills. This taxonomy
consists of four levels: literal recognition or recall; inference; evaluation; and
appreciation. Each level consists of several sub levels. Barrett’s taxonomy
focuses on reading and is especially relevant for language teaching and learning.
However, what needed are also taxonomies of the productive language skills of
writing and speaking. In second language situations, such taxonomies would be
useful in charting out progress in learning as well as specifying a comprehensive
teaching plan.

activity 1.3
Here is a brief passage from a short story:

Mum is hard-working. She never stops. She works day and night with
hardly a break. I seldom see her holding a cup of tea, sitting back and
relaxing while watching TV. And when she does watch a funny movie
(not very often, as I said before), she doesn’t laugh. Again, I don’t know
why. Maybe she just doesn’t understand what is so funny about the jokes.
Or maybe she’s just too serious.
(From My mother and my diary by Ruth Ming Har Wong).
Try to create a question for each of the six levels of Bloom’s taxonomy.
1.4 HOW DO STUDENTS BENEFIT

FROM A TEST?
While the kinds of decisions that are made above are largely teacher and educator
centred, tests also provide students with several benefits. First, there is the
benefit of motivation. Whenever a teacher announces that there will be a test,
the tendency for most students is to study and revise material in preparation
for the test. In other words, the test acts as an impetus for study. Such form of
motivation is useful when it is done sparingly as teachers should not depend only
on tests to motivate students.
The better students also use tests as a source of information. Feedback from test
scores inform students of their strengths and weaknesses, whether their study
approach has been beneficial, and if they have understood the material taught.
In other words, information in the form of test results is equally important for
the student as it is for the teacher. As such, it should be a general practice to
return test papers as often and as quickly as possible. A different way of looking
at things is that teachers are now presented with a new responsibility – i.e. to
develop in their students the ability and self directedness to use information from
such sources as test results to learn and plan their own learning.
1.5 DECISIONS MADE BASED ON TESTS

Why do we test? Do teachers and instructors have a sadistic streak that they have
tests simply to see their students slog and burn the midnight oil preparing for
the test? Certainly not! There are more noble intentions in testing. We can say
that the main purpose of tests is to obtain information concerning a particular

behaviour or characteristic. Based on information obtained from tests, several
different types of decisions can be made.
Kubiszyn & Borich (2000), mention eight different types of decisions made on the
basis of information obtained from tests. These educational decisions are shown
in Figure 1.3.
Figure 1.3: Types of educational decisions
The first three decisions are often within the domain of the classroom teacher. He
or she can make decisions with respect to instruction, grading as well as diagnostic
activities.
Instructional decisions are made based on test results when, for example, teachers
decide to change or maintain their instructional approach. If a teacher finds
out that most of his class have failed his test, there are many possible reactions
he can have. First, he could be very disappointed, blame the students for not
studying and punish them in some way. Of course, this is not a wise decision to
make. Instead, the teacher could evaluate the effectiveness of his own teaching
or instructional approach. An instructional decision is made when the teacher
decides upon the approach currently used. Perhaps the teacher may decide that
the approach is not suitable and a different approach should be used.
Tests yield scores and teachers will have to make decisions in terms of the kind of
grades to give students. As grades are indicators of student performance, teachers
need to decide whether a student deserves a high grade – perhaps an A – on the
basis of some form of assessment. Traditionally, and perhaps for a long time to
come, this assessment will be in the form of tests.

Sometimes, we give tests to find out the strengths and weaknesses of our
students. Can they correctly construct a passive sentence? Do they use the
different pronoun forms correctly? These kinds of questions can be answered
by observing student performance on tests. When a teacher decides that he
will spend more time teaching passive sentences because student performance
on such sentences in a test was unsatisfactory, then he has made a diagnostic
decision.
Decisions related to selection, placement, counselling and guidance, programme

or curriculum, and administrative policy are all made at levels higher than the
classroom. Administrators, educational agencies and institutions may be involved
in these decisions.
Selection and placement decisions are somewhat similar. However, a selection

decision relates to whether or not a student is selected for a programme or for
admission into an institution based on a test score. Tests such as TOEFL and
IELTS are often used by universities to decide whether a candidate is suitable,
and hence selected for admission. A placement decision, however, deals with
where a candidate should be placed based on performance on the test. A clear
example is the language placement examination for newly admitted students
commonly administered by many local and foreign universities. Based on their
performance on such a test, students are placed into different language classes
that are arranged according to proficiency levels.
Counselling and guidance decisions are also made by relevant parties such
as counsellors and administrators on the basis of exam results. Counsellors
often give advice in terms of appropriate vocations for some of their students.
These advice is likely to be made on the basis of the students’ own test scores.
Programme or curriculum decisions reflect the kinds of changes made to the
educational programme or curriculum based on examination results. Finally,
there are also administrative policy decisions that need to be made which are also
greatly influenced by test scores.
1.6 HOW DO WE CONSTRUCT A TEST
activity 1.4
What do you think are students’ reaction towards tests? Do they enjoy or
fear tests? Discuss with your coursemates.

Figure 1.4: General stages in test construction
The framework of a test is reflected in the way the test is constructed. The
first stage in constructing a test is to determine what is to be tested. This is not
as easy as it seems because it requires determining the theoretical construct
of what is to be tested. For example, let’s assume that we are interested in
testing communicative competence. This requires that a theoretical construct
of communicative competence be first determined. Various theories of
communicative competence have been suggested (c.f. Bachman, 1990; Canale
& Swain, 1980). We need to examine these theories and determine what
communicative competence is to us for the purpose of our test.
The second step in test construction is to operationalise the theoretical construct.

A theoretical construct must necessarily be an idealised and abstract notion.
When it is operationalised, it is reduced in order to fit into the constraints of a
test. The many different formats of tests – multiple choice, dictation, essay-type,
matching, etc. – represent the different kinds of operationalisation available in
tests.
Finally, the third step in constructing a test is quantification. As a test is a measure,

then numbers and quantities will be a necessary element. Once again, just as
with the previous stages, we may tend to take this stage of test construction
for granted. There is more to quantification than simply assigning numbers or
points to items in a test. If a test consists of two sections using different formats
such as multiple choice questions and short answer questions, what weightage
of points would you assign to the items in each section? Even the assignment of
these points must be justified.
The steps described above provide a general description of the test construction
process. In actual practice, there may be some additional steps that need to be
taken. Sometime back, I was asked to construct a test of English language
proficiency for a private company. When I set out to do the task, I listed down the
steps that I probably had to take. One of the first steps I felt necessary was some
form of needs analysis in order to determine what kind of language should be
tested. I wanted to find out from the management what sort of test they wanted
and whether what I had in mind fit their requirements. My intention was to draft
the test, show the draft to the management for approval, pilot it and later validate
the test in some way.

I would also imagine that if I were teaching in the public schools, I would
probably not spend so much time on the three steps described earlier – theoretical
construct, operationalisation, and quantification – because the test construction
process has largely been determined by the Ministry of Education. The national
standardised Sijil Peperiksaan Malaysia is already an embodiment of the three
stages and teachers merely need to follow the model examination paper with
respect to these three elements. However, it may be helpful to construct a test
blueprint in order to ensure that my test spans the necessary content and that
there is a variety of skill or abilities being tested.
Table 1.2: Example of Test Blueprint According to Bloom’s Taxonomy
Comprehension
Application
Knowledge
Evaluation
Synthesis
Analysis
Total
Section A 1, 3, 2, 4, 5 8 6 7, 10 9 10
Comprehension
Section B 12, 16, 11, 14, 17 13 15, – 10
Grammar 18, 19 20
Section C 21, 23 22, 29 24, 25, 27, 28 30 10
Functions 26
Total 8 8 5 3 4 2 30
There are numerous ways of forming test blueprints, some more comprehensive
than others (see Nitko, 2001 for several examples), but an important point to
remember is that the test blueprint should be used only as a tool rather than to
“promote exact or rigourous classification” (Nitko, 2001: 113). Nevertheless, the most
common form of test blueprints in schools in Malaysia has incorporated Bloom’s
taxonomy as its primary method of classification. In the example in Table 1.2, the
30 items in the test are categorised according to Bloom’s taxonomy. The numbers
1 to 30 in the blueprint refer to the test item numbers. Items number 1 and 3,
for example, are comprehension items which test knowledge, while item 8 tests
application. A blueprint such as this is useful in ensuring that different kinds of
questions are asked. In this particular example, most questions are knowledge
and comprehension type questions (8 each) which tends to be quite common.
However, all six question types are quite well represented and as such, the test
itself can be considered acceptable.

activity 1.5
(a) What do the terms tests, assessments and measurements mean and
how are they interconnected ?
(b) What constitutes a good test ? What are the 4 major parts of a test as
suggested by Wesche (1983)?
SUMMARY
• This topic has presented a discussion on various basic issues dealing with
tests and measurements.
• It has looked at terminology related to tests and measurements and
attempted to distinguish between terms which are similar.
• It has also attempted to situate testing within the instructional process,
taking into consideration instructional objectives as well as decisions.
Assessment Measurement
Evaluation Test

05 HBET4503 Topic 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

05 HBET4503 Topic 1

Uploaded by

Copyright:

Available Formats

TOPIC 1 THE ROLE AND PURPOSE OF TESTING AND EVALUATION IN TESL

Topic  The Role and

1.1 WHAT ARE TESTS, ASSESSMENTS

Copyright © Open University Malaysia (OUM)

The relationship between tests, assessments and measurement can be

Figure 1.1: The relationship between tests, measurement and assessment

Bachman (1990), considers qualitative assessment of students as an

Copyright © Open University Malaysia (OUM)

1.2 WHAT CONSTITUTES A TEST?

(a) Stimulus Material

(b) Task Posed to the Learner

(c) Learner’s Response

(d) Scoring Criteria

Copyright © Open University Malaysia (OUM)

The following is a test question taken from an English textbook.

Use the Wesche (1983), framework to describe this test item.

1.3 A PRELIMINARY UNDERSTANDING OF

Figure 1.2: The relationship between planning, instruction, and testing

Copyright © Open University Malaysia (OUM)

In this model, we are reminded that instruction itself is guided by curriculum

1.3.1 Taxonomies of Instructional Objectives

instructional objectives especially if we wish to know whether the instruction has

Level Test Question

Bloom’s taxonomy consists of six levels which are generally considered to be

Copyright © Open University Malaysia (OUM)

Here is a brief passage from a short story:

1.4 HOW DO STUDENTS BENEFIT

1.5 DECISIONS MADE BASED ON TESTS

that the main purpose of tests is to obtain information concerning a particular

Figure 1.3: Types of educational decisions

Copyright © Open University Malaysia (OUM)

Decisions related to selection, placement, counselling and guidance, programme

Selection and placement decisions are somewhat similar. However, a selection

1.6 HOW DO WE CONSTRUCT A TEST

Copyright © Open University Malaysia (OUM)

Figure 1.4: General stages in test construction

The second step in test construction is to operationalise the theoretical construct.

Finally, the third step in constructing a test is quantification. As a test is a measure,

Copyright © Open University Malaysia (OUM)

Table 1.2: Example of Test Blueprint According to Bloom’s Taxonomy

Copyright © Open University Malaysia (OUM)

Copyright © Open University Malaysia (OUM)

You might also like