Professional Documents
Culture Documents
LANGUAGE TESTING Chapter 3 & 8
LANGUAGE TESTING Chapter 3 & 8
LANGUAGE TESTING
Lecturer: Prof. Dr. Hj. Djamiah Husain, M.Hum.
By:
The Second Group
CLASS C
GRADUATE PROGRAM
STATE UNIVERSITY OF MAKASSAR
2016
0
INTRODUCTION
Language testing is designed to find out what students have learned both
in language skills and language areas. Other terms of language testing are
language assessment and evaluation. The term of language assessment is used in
free variation with language testing although it is also used somewhat more
widely to include for example classroom testing for learning and institutional
examinations. It is a program under the field of applied linguistics that essentially
focuses on evaluating a person’s fluency in a language.
Testing also has an ethical dimension in so far as it affects people’s lives
(see Davies (ed.) 1997). This leads us into the area of consequential validity
where we are concerned with a test’s impact on individuals, institutions and
society, and with the use that is made of test results. Getting it right, ensuring test
fairness is a necessity not an ideal for testing. In developing assessment tools a
decision must be taken on what is criteria in the particular domain under review,
and this decision and the test measures used for operational zing it must be
ethically defensible. Test developers must be made accountable for their products.
Test validation is the process of generating evidence to support the well
founded of inferences concerning trait from test scores, i.e., essentially, testing
should be concerned with evidence-based validity. Test developers need to
provide a clear argument for a test’s validity in measuring a particular trait with
credible evidence to support the plausibility of this interpretative argument (see
Kane 1992).
Language evaluation itself is used for various purposes in education.
Students’ evaluation gauges students’ growth, development, and progress against
stated learning objectives. Making judgments on the basis of the information
collected. Evaluation tells educators the strengths and weaknesses of the program
in order that adjustments and adaptations can be made. In addition, teachers grow
professionally when they reflect on their own teaching and when they keep
informed of current instructional strategies and evaluation methods they may use
in their programs.
1
Questions about the tests to be tested can be made by teachers themselves
or others. There are some of teachers do not understand about the creation of
good, quality matter made or things that need to be tested. They do not know how
to test the quality of questions that tested or will be tested. It is also not free from
their ignorance of what is language testing or language tests. Language testing
needs to be known because it would provide the basis for language testing.
In testing language, it is necessary to know what should be tested. As
mentioned above that language testing involves language skills and language
areas. In testing language skills, four skills of languages are important to be tested.
Four skills of English that should be tested are listening comprehension, speaking
ability, reading comprehension, and writing ability. Besides, testing the language
areas involves tests of grammar and usage, tests of vocabulary and tests of
phonology.
2
DISCUSSSION
3
on test scores when screening or selection decisions are being made. In order for
such decisions to be fair, our tests must be accurate in the sense that they must
provide information that is both reliable and valid.
In the area of language testing, a common screening instrument is termed
an aptitude test. It is used to predict the success or failure of students prospective
in a language learning program.
3. Placement
Closely related to the notions of diagnosis and selection is the concept of
placement. In this case test is used to identify a particular performance level of the
students and to place him or her at an appropriate level of instruction. It follows
that a given test may serve a variety of purposes; thus the UCLA Placement Exam
may be use to assign students to level as well as to screen students with extremely
low English proficiency from participation in regular university instruction.
A placement test identifies the right class for a particular learner; there is
no such thing as a good score or a bad score, only a recommendation for the most
suitable class. Obviously, the tester must know which classes or level are
available classes to put the learner.
4. Program Evaluation
In this way the focus of evaluation is not the individual student so much as
the actual program of instruction. Therefore, group mean average scores are of
individual students. Often one or more pretests are administered to assess gross
levels of student proficiency or “entry behavior” prior to instruction. Following
the sequence of instruction, one or more posttests are administered to measure
post-instructional levels of proficiency or “exit behavior”. The differences
between pretest and posttest scores for each student are referred to as gain scores.
Frequently in program evaluation tests or quizzes are administered at
intervals throughout the course of instruction to measure “en route behavior”. If
the result of these tests is used to modify the program to better suit the need of the
students, this process in termed formative evaluation. The final exam or posttest is
administered as a part of the process of what is called summative evaluation.
4
Sometimes language program may be evaluated by comparing mean
posttest or gain scores one program or partial program with those of other
programs. Whatever the method of evaluation, the important of sensitive, reliable,
and valid test is obvious.
5. Providing Research Criteria
Language tests scores often provide a standard of judgment in a variety of
other research contexts. Comparisons of methods and technique of instruction,
textbooks, or audiovisual aids usually entail reference to test scores. Even
examination of the structure of the language itself or the physiological and
psychological processes of language use may involve some form of measurement
testing. If we are to learn about effective methods of teaching, strategies of
learning, presentation of material for learning, or description of language and
linguistic processes, greater effort will need to be expended in the development of
suitable language tests.
6. Assessment of Socio-Psychological Differences
Aptitude toward the target language, its people, and their culture has been
identified as important affective correlates of good language learning. It follows
that appropriate measures are needed to determine the nature, direction, and
intensity of attitudes related to language acquisition. Apart from attitudes, other
variables such as cognitive style of the learner, socioeconomic status and locus of
control of the learner, linguistic situational context, and ego permeability of the
learner have been found to relate to level of language achievement and/or
strategies of language use. Each of these factors in turn must be measured reliably
and validly in order to permit rigorous scientific inquiry, description, explanation,
and/or manipulation. This is offered as further evidence for the value of a wide
variety of tests to serve a variety of important functions.
5
they describe two or more extremes located at the end of the same continuum.
Many of the categorizations are merely mental constructs to facilitate
understanding. The fact that there are so many categories and that there is so
much overlap seems to indicate that few of them are entirely adequate in and of
themselves, particularly the broadest categories. This part describes the types of
language testing. They are: objective vs subjective tests, direct vs indirect tests,
discreate-point vs integrative tests, aptitude, achievement, and proficiency tests,
criterion-refferenced vs norm referenced tests, speed tests vs power tests, and
other test categories.
1. Objective vs. Subjective Tests
An objective test is said to be one of that may be scored by comparing
examinee responses with an established set of acceptable responses or scoring
key. No particular knowledge or training in the examined content area is required
on the part of the scorer. A common example would be a multiple-choice
recognition test. Conversely a subjective test is said to require scoring by
opinionated judgment, hopefully based on insight and expertise, on the part of the
scorer.
Many test, such as cloze tests permitting all grammatically acceptable
responses to systematic deletion from a context, lie somewhere between the
extreme of objectivity and subjectivity. So, it is called subjective test such as free
compositions are frequently objectified in scoring through the use of precise
rating schedules clearly specifying the kinds of errors to be quantified, or through
the use of multiple independent raters.
Objectivity-subjectivity labels, however, are not always confined in their
application to the manner in which tests are scored. These descriptions may be
applied to the mode of item of distracter selection be the tests developer, to the
nature of response elicited from the examinee, and to the use that is made of the
results for any given individual. Often the term subjective is used to denote
unreliable or undependable. The possibility of misunderstanding due to ambiguity
suggests that objective-subjective labels for tests are of very limited utility. This
objective and subjective tests will be further discussed in another part.
6
2. Direct vs. Indirect Tests
It has been said that certain tests, such as a ratings of language use in real
and uncontrived communication situations, are testing language performance
directly; whereas other tests, such as multiple-choice recognitions tests, are
obliquely or indirectly tapping true language performance and therefore are less
valid for measuring language proficiency. Whether or not this observation is true,
many language tests can be viewed as lying on a continuum from natural-
situational to unnatural-contrived. Thus an interview may be thought of a more
direct than a cloze test for measuring overall language proficiency. A
contextualized vocabulary test may be thought more natural and direct than a
synonym-matching test.
The issue of test validity is treated in greater detail in another part. It
should be noted here that the usefulness of tests should be decided on the basis of
other criteria in addition to whether they are direct or natural. Sometimes tests are
explicitly designed to elicit and measure language behaviors that occur only rarely
if at all in more direct situations. Sometimes most of the value of direct language
data is lost through reductionism in the manner of scoring.
3. Discrete-Point vs. Integrative Tests
Another way of slicing the testing pie is to view tests as lying along a
continuum from discrete-point to integrative. Discrete-point tests, as a variety of
diagnostic tests, are designed to measure knowledge of performance in a very
restricted target language. Thus test of ability to use correctly the perfect tenses of
English verbs or to supply correct prepositions in a cloze passage may be termed a
discrete-point test. Integrative tests, on the other hand, are said to tap a greater
variety of language abilities concurrently and therefore may have less diagnostic
and remedial-guidance value and greater value in measuring overall language
proficiency. Examples of integrative tests are random cloze, dictation, oral
interviews, and oral imitation tasks.
Here again, some tests defy such ready-made labels and may place the label
advocates on the defensive. A test of listening comprehension may top one of the
label four general language skills (i.e., listening, speaking, reading, and writing) in
7
a discrete manner and thus have limited value as measure of overall language
proficiency. On the other hand, such a test may examine a board range of lexis
and diverse grammatical structures and this way be said to be integrative.
4. Aptitude, Achievement, and Proficiency Tests
Aptitude tests are most often used to measure the suitability of a candidate
for a specific program of instruction or a particular kind of employment. For this
reason these test are often used synonymously with intelligence test or a screening
tests. A language aptitude may be used to predict the likelihood of success of a
candidate for instruction in a foreign language. The Modern Language Aptitude
Test is a case in point. Frequently vocabulary tests are effective aptitude
measures; perhaps because they correlate highly with intelligence and may reflect
knowledge and interest in the content domain.
A language aptitude test (prognostic test) is designed to measure the
students’ probable performance in a foreign language which they have not started
to learn. It assesses aptitude for learning a language. Language learning aptitude is
a complex matter, consisting of such factors as intelligence, age, motivation,
memory, phonological sensitivity, and sensitivity to grammatical patterning.
Achievement tests are used to measure the extent of learning in a prescribed
content domain, often in accordance with explicitly stated objectives of a learning
program. These tests may be used for program evaluation as well as for
certification of learned competence. It follows that such tests normally come after
a program of instruction and that the components or items of the tests are drawn
from the content of instruction directly. If the purpose of achievement testing is to
isolate learning deficiencies in the learner with the intention of remediation, such
tests may also be termed diagnostic tests.
Achievement (attainment) tests are based on what the students are
presumed to have learnt- not necessarily on what they have actually learnt nor on
what has actually bee though. Achievement tests are more formal and are intended
to measure achievement on a large scale.
Proficiency test are most often global measures of ability in a language or
other content area. They are not necessarily developed or administered with
8
reference to some previously experience course of instruction. These measures are
often used for placement of selection, and their relative merit lies in their ability to
spread students out according to ability on a proficiency range within the desired
area of learning.
It is important to note that the primary differences among these three kinds
of test are in the purposes they serve and the manner in which their content is
chosen. Otherwise it is uncommon to find individual items that are identical
occurring in aptitude, achievement, and proficiency tests.
9
Norm-referenced test are not without their share of weakness. Such test are
usually valid only with the population on which they have been normed. Norms
change with time as the characteristics of the population change, and therefore
such tests must be periodically renormed. Since such tests are usually developed
independently of any particular course of instruction, it is difficult to match results
perfectly with instructionally objectives. Test security must be rigidly maintained.
Debilitating test anxiety may actually be fostered by such tests. It has also been
objected that, since focus is one the average score of the group, the test may
insensitive fluctuations in the individual. This objections relate to the concept of
reliability discussed in other part, and may be applied to criterion-referenced as
well as to norm-referenced tests.
Some teachers may fail to grasp the distinctions between criterion-
referenced and norm-referenced testing. It is common to hear the two types of
testing referred to as if they serve the same purposes, or shared the same
characteristics. Much confusion can be eliminated if the basic differences are
understood.
6. Speed Tests vs. Power Tests
A purely speed tests is one in which the items are so easy that every person
taking the test might be expected to get every item correct, given enough time.
But sufficient time is not provided, so examinees are compared on their speed
performance rather than on knowledge alone. Conversely, power tests by
definition are tests that allow sufficient time for every person to finish, but that
contain such difficult items that few if any examinees are expected to get every
item correct. Most tests fall somewhere between the two extremes since
knowledge rather than speed is the primary focus, but time limits are enforced
since weaker students may take unreasonable periods of time to finish.
7. Other Test Categories
The few salient test categories mentioned here are by no means exhaustive.
Mention could be made of examinations vs. Quizzes, questionnaires. A distinction
could be made between single-stage tests as is done in other part. Contrast might
10
be made between language skill tests and language features tests, or between
production and recognition tests.
As a still lower level of discrimination, mention will be made of cloze of
tests, dictation tests, multiple-choice tests, true false tests,
essay/composition/précis tests, memory-span tests, sentence completion tests,
word-association tests, and imitation tests, not to mention tests of reading
comprehension tests, listening comprehension, grammar, spelling, auditory
discrimination, oral production, listening recall, vocabulary recognition and
production and so on.
A. tell
3. Ali ought not to B. having told me his secret, but he did.
C. Be telling
D. Have told
11
4. A. Ali ought not to tell me his secret, but he did.
B. Ali ought not to having told me your secret, but he did.
C. Ali ought not to be telling me your secret, but he did.
D. Ali ought not to have told me your secret, but he did.
A B C D
b. The man/enjoyed/looking the children/playing in the yard.
12
A B C D
c. Rini’s mother/does not let her/to play/on the dirty floor.
3. Rearrangement Tests
Rearrangement items can be in the form of multiple choice items or in the
other forms. Consider the following different examples:
Well, you know how……………..
A. Warm is it today C. today it is warm
B. Is it warm today D. warm is it today
This type of arrangement may be more confusing and may be written in
word order form:
Complete each sentence by putting the word below it in the right order, put in the
boxes only the letters of the words.
Well, you know how……….
A. it B. today C. warm D. is
Not only………, but he also took me to his house.
A. me B. he C. did D. meet
4. Completion Tests
Completion item is a useful means of testing the student’s ability to
produce acceptable and appropriate forms of language. It measure production
rather than recognition, testing the ability to insert the most appropriate words in
selected blanks in sentences. The words selected for omission are grammatical or
functional words.
The answer to the above sentence is the, and there is no other possible answer.
The following example indicates the wide range of possibilities for one
completion item:
13
The answer obviously required by the tester is haven’t been; however,
possible answers are:
There are three possible ways of restricting the possible answers available
that is, by providing the context, providing data, or using multiple choice
techniques. The completion items in context may be in the form of blanks or the
omissions are not indicated; the students are required to put a slash (/) at the place
where word has been omitted and then to write the missing word in the
appropriate space. Consider the following example;
5. Transformation Tests
The transformation type of them is exceptionally useful for testing ability
to produces structures in the target language, although transforming sentences is
different from producing sentences.
Rewrite each the following sentences in another way, beginning each new
sentence with the words given. Make any changes that are necessary but do not to
change the general meaning of the sentence.
1. I haven’t written to you for a long time.
It’s a long time…………………………………..
14
2. Ahmad can sing better than you.
You cannot………………………………………
15
CONCLUSION
16
REFERENCES
Jabu, Baso. 2008. English Language Testing. Makassar: the UNM Publisher
Larsen-Freeman, D. 2001. Teaching Grammar. In M. Celce-Murcia (ed.),
Teaching English as a Second or Foreign Language (3rd ed., pp. 251-
66). Boston, MA: Thomson/ Heinle.
Thornbury, Scott. (1999). How to Teach Grammar. Essex: Pearson
Education Limited.
Mart, Ç. T. 2013. Theory and Practice in Language Studies. Erbil:
Department of Languages, Ishik University. (Vol. 3, No. 1, pp. 124-
129,)
http://washington.academia.edu/PriscillaAllen taken on 3rd Sept. 2016
17