Evaluation, assessment, testing language learners

Evaluation: involves looking at all the factors that influence the learning process, e.g.
syllabus
objectives, course design, materials, methodology as well as teacher performance.
Assessment: involves measuring the performance of our students and

the progress they are making. It helps us to be able to diagnose the
problems they have and to provide them with useful feedback.
"any means of checking what students can do with language" (Wingard 81:171)
Testing: it is the way assessment is formalized in specific tools such as

exams, quizzes or even check lists that help teachers and students keep
a detailed record of results.
INFORMAL ASSESSMENT (continuous assessment)

It’s a way of collecting information about our students' performance in normal
classroom conditions.
Sometimes referred to as continuous assessment -> done over a period of time like a term or an
academic year.
● It is important to link the informal assessment we do with our Formal Assessment

(tests) and with self-assessment done by the students themselves.
● We must establish clear criteria for assessing the students.
FORMAL ASSESSMENT
● Examinations or any type of language test administered in class by a teacher
SELF ASSESSMENT AND PEER-ASSESSMENT
The involvement of learners in learning.
● May enhance autonomy
Dickinson (1987) says that it is particularly appropriate:

● as a complement to self-instruction, for example for students following a self access
course.
● for initial assessment, whether it is to place students, for diagnosis of their problems, or
assessment of their progress.
● to build autonomous and self-directed language learners, an objective which Dickinson
says is an important educational objective in itself .
● to reduce the assessment burden on a teacher.
SUMMATIVE AND FORMATIVE ASSESSMENT

Formative (FOR learning) - assist -> aim to evaluate the effectiveness of learning
Summative (OF learning)- assess -> assess learning, and pass some judgment on the
learner's current performance.
TEST TYPES
Aptitude tests:
Designed to predict who will be a successful language learner, they are based on
the factors which are thought to determine an individual's ability to acquire a second or foreign
language (rather than an individual's ability to use a language at the time of testing.)
● are usually large scale tests taking a long time to administer and with a number of
components, each testing a different facet of language.
● are forward-looking tests, concerned with future language learning, not with previous
learning.
● try to predict learning ability over a long period.
—-
Many proficiency tests where students solve a number of different papers, there is a mixture
of direct and indirect as well as discrete-point and integrative testing.
★ The degree of agreement between two examiners about a mark for the same language
sample is known as inter-rater reliability.
★ The degree of agreement between one single examiner marking of the same language
sample on two separate occasions is known as intra-rater reliability.
BOTH are low in first generation tests.
First generation tests:

Candidates are asked to complete various questions such as compositions, translations,
or simple question and answer activities devoid of context. e.g. Write about a holiday you
enjoyed
(200 words) - who to? why?
● It is common for two different examiners to mark the same test in a very different way
(inter), or the same examiner to mark the same test differently on two different occasions
(intra)
● Low inter-rater and intra-rater reliability
Second generation tests:

Questions in second generation testing normally measure one item of language, known as a
discrete point, (to contrast with the integrative language tested in the first generation tests
above.) Since each question tests one tiny aspect of language (e.g, a verb form, a preposition
etc.), tests are often very long, consisting of many questions. Cloze
Third generation tests:

Authentic reading with some transfer of information such as correcting some notes taken from it,
or writing a note with instructions about some aspect of household organization, or listening to
an airport announcement to find the arrival time of a plane, or giving someone spoken
instructions for how to get to a house.
Second generation testing is very reliable, third generation testing is more valid.
COMPETENCE VS. PERFORMANCE
Competence -> the ideal knowledge of language all mature speakers hold in their minds
Performance -> the imperfect realization of the language that comes out when it’s used
Competence: the speaker - listener’s knowledge of his language

Performance: the actual use of language in concrete situations
USAGE VS. USE

- Language usage refers to the rules and structures used for making language
- Language use considers the communicative meaning and utility of language
The type of output one might expect from a learner whose instruction has consisted of
grammatical rules, and who will therefore be asked to produce sentences to illustrate
these rules would be examples of usage. These examples can be used as an indication of
the learner's current state of competence, but will not necessarily indicate anything about the
learner's possible performance.
—
Performance teaching, and therefore performance testing, require examples of language use,
not usage.
DISCRETE POINT VS. INTEGRATIVE ASSESSMENT

Discrete-point testing: tests one thing at a time (asking students to choose the correct tense of
a verb)
Techniques:
- Transformation (rewrite a sentence in another way)
- Completion/addition (complete a blank space with xyz)
- Combination (combine sentences)
- Rearrangement (order the words)
- Correct/incorrect (circle correct or incorrect)
- Matching elements (order word in boxes or columns)
*Indirect assessment is usually carried out through a battery of many items, each one of which
only tests one small part of the language. Each item is known as a discrete-point item.
—
Integrative/global testing: expects students to use a variety of language at any one given time
(writing composition or making an oral presentation)
● Most of its items require subjective assessment (an assessor to make a judgment
according to some criteria and experience)
DIRECT AND INDIRECT ASSESSMENT
Direct -> if it asks candidates to perform the communicative skill that is being tested.
● Tries to be as much real-life language use as possible.
● Uses examples of performance as an indicator of communicative competence.
● Tests use testing tasks of the same type as language tasks in the real-world.
Techniques:
- Speaking
- Interviewer questioning candidates about themselves
- Using pictures for candidates to compare and contrast
- Role-playing
- Writing
- Writing stories or compositions
- Information leaflets about their school or a place in their town
- Transactional letters where candidates reply to a job advertisement
- Reading
- Multiple choice questions to test comprehension of a test
- Choosing the best summary of a paragraph or a whole text
- Transferring written info to charts, graphs, maps. etc
- Listening
- Completing charts with facts and figures from a listening text.
- Identifying which speaker says what.
- Following directions on a map and identifying the correct hous eo
place.
Indirect -> tries to measure a student’s knowledge and ability by getting at what lies beneath
their receptive and productive skills.
● Try to find out about a student's language knowledge through more controlled items,
such as multiple choice questions or grammar transformation items.
● Assesses competence without eliciting performance (e.g. multiple choice, because no
production of language use from the language learner is required)
● Usually carried out through a battery of many items, each one of which only tests one
small part of the language. Each item is known as a discrete-point item.
Techniques:
- Multiple choice questions
- Cloze procedures
- Transformation or paraphrase (rewrite a sentence)
- Sentence reordering (scrambled sentences)
- Sentence fill-ins (complete the sentence logically)
- Choosing the correct tense of verbs in sentences and passages
- Finding errors in sentences
Indirect assessment contrasts to communicative assessment, which sees language use as

basically indivisible and more than a sum of its parts. Testers require items which test
the ability to combine knowledge of different parts of the language; these items are
known as integrative or global items. Examples might include responding to a letter, filling
in a form etc.
Validity: A test is valid if it tests what it is supposed to test.

It’s not valid if it tests, for example, writing ability but there’s a question that requires knowledge
of history or biology, unless students share this knowledge.
Reliability: A good test should have consistent results.

If the same group of students took the same test twice within two days, they should get the
same results on each occasion. If they took another similar test, the results should be
consistent.
● enhanced by making the test instructions absolutely clear, restricting the scope for
variety in the answers, and making sure that test conditions remain constant.
● It also depends on the people who mark the tests. Clearly a test is unreliable if the result
depends to any large extent on who is marking it.
For a test to be reliable and valid, it should:
- Create fair conditions to every examinee
Students shouldn't need to have specialist knowledge.
- Replicate real-life interaction
Tests should be as realistic as possible, even if they are not authentic.
Utility: A test with high utility will give a lot of feedback to assist in the planning of the
rest of a course or future courses.
● Informal or classroom tests should have high utility, telling both the learner and the
teacher where problems exist.
● Formal examinations traditionally have low utility, with little specific feedback being given
to the learner.
● The more detailed this feedback is, the more useful it is as a tool for the future.
Discrimination: The ability of a test to discriminate between stronger and weaker

students.
● Desirable in a placement test or formal examination. (if a placement test is administered
on which everyone scores 90%, it has obviously failed to discriminate and the results
cannot be used as the basis for forming groups; however, if all the students score 90%
or more on a class progress test, the test shows that the material has been well
assimilated.)
Practicality: It refers to the efficiency of the test in physical terms

- Does it require a lot of equipment?
- Does it take a lot of time to set, administer or mark?
- Does it require a lot of people to administer?
OBJECTIVE VS SUBJECTIVE ASSESSMENT
Objective -> It refers to test items that can be marked clearly as right or wrong (multiple choice
item)
● Receptive skills (reading and listening) lend themselves to to objective marking
Subjective -> requires that an assessor makes a judgment according to some criteria and
experience.
● The issue is to achieve some agreement over marks, both between different markers
(inter-rater reliability) and with the same marker at different times (intra-rater reliability).
● Productive skills (speaking and writing) lend themselves to to subjective marking
Most integrative test elements (writing composition or making an oral presentation) require
subjective assessment.
RECEPTIVE VS. PRODUCTIVE SKILLS
Receptive skills -> reading and listening

● Lend to objective marking
Productive skills -> speaking and writing

● Lend to subjective marking
BACKWARD- LOOKING AND FORWARD-LOOKING ASSESSMENT
Backward-looking -> competence tests (to see to what degree it has been assimilated by the
learner)
Forward-looking assessment-> Third generation tests are better linked to the future use of
language, and their assessments of real language use also show mastery of a performance
based syllabus.
CONTEXTUALIZED VS. DISEMBODIED LANGUAGE
Contextualized language:
● Integrative items need a full context in order to function. The closer the items
in an integrative test are to simulating real world language trask, the fuller the
context must be.
- information about communicative purpose, role relationships, channel of
communication etc.
Disembodied language:
● Has little or no context (multiple choice test, based on language usage)
● Items bear little relevance to each other, with no purpose other than as part of a test
BAND SCALES (CRITERIA)
Holistic -> give overall descriptions of ability. Students’ performance is matched to one of the
bands (a number).
LEFT: HOLISTIC
RIGHT: ANALYTIC
Analytic -> separate out aspects of language performance into individual scales, giving a profile
of performance.
● It has more bands
● The more bands that are used the more difficult it is to make reliable descriptors of each
band.

Evaluation, assessment, testing language learners

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Evaluation, assessment, testing language learners

Uploaded by

Copyright:

Available Formats

Evaluation: involves looking at all the factors that influence the learning process, e.g.

Assessment: involves measuring the performance of our students and

Testing: it is the way assessment is formalized in specific tools such as

INFORMAL ASSESSMENT (continuous assessment)

● It is important to link the informal assessment we do with our Formal Assessment

● We must establish clear criteria for assessing the students.

Dickinson (1987) says that it is particularly appropriate:

SUMMATIVE AND FORMATIVE ASSESSMENT

First generation tests:

Second generation tests:

Third generation tests:

COMPETENCE VS. PERFORMANCE

Competence: the speaker - listener’s knowledge of his language

USAGE VS. USE

DISCRETE POINT VS. INTEGRATIVE ASSESSMENT

DIRECT AND INDIRECT ASSESSMENT

Indirect assessment contrasts to communicative assessment, which sees language use as

Validity: A test is valid if it tests what it is supposed to test.

Reliability: A good test should have consistent results.

Discrimination: The ability of a test to discriminate between stronger and weaker

Practicality: It refers to the efficiency of the test in physical terms

OBJECTIVE VS SUBJECTIVE ASSESSMENT

● Receptive skills (reading and listening) lend themselves to to objective marking

RECEPTIVE VS. PRODUCTIVE SKILLS

Receptive skills -> reading and listening

Productive skills -> speaking and writing

BACKWARD- LOOKING AND FORWARD-LOOKING ASSESSMENT

CONTEXTUALIZED VS. DISEMBODIED LANGUAGE

You might also like