You are on page 1of 29

Assessment

Chapter 2
Source: Douglas Brown, H. (2004) Language
Assessment: Principles and Classroom Practices.New
York: Longman
Principles of Language Assessment

✦ How do you know if a test is


effective?
Principles of Language Assessment
PRACTICALITY

RELIABILITY

VALIDITY

AUTHENTICITY

WASHBACK
1. PRACTICALITY

It is not excessively
expensive

An effective It stays within appropriate


test is time constraints
practical
when ... It is relatively easy to
administer

It has a scoring/evaluating
procedure that is specific
and time-efficient
1. PRACTICALITY
Let´s consider this practicality checklist …
1. RELIABILITY

If you give the same test to


A reliable
test is the same student or
consistent matched students on two
and different occasions, the test
dependable should yield similar results.
2. RELIABILITY

Fluctuations in the student

What makes Fluctuations in the scoring


a test
Unreliable? Fluctuations in test administration

Fluctuations in the test itself


2. RELIABILITY

Fluctuations in the student

Most commonly caused by:


• Temporary illness
• Fatigue
• A “bad day”
• Anxiety
• Other physical or psychological factors
• Test – taking strategies
2. RELIABILITY
Fluctuations in the scoring
Lack of attention to
Inter-rater reliability:
scoring criteria.
• Two or more scorers
Inexperience
yield inconsistent
Inattention
scores of the same Bias
Most commonly caused test
by: human error,
subjectivity and bias, on
Unclear scoring
the part of the teacher. Intra-rater reliability: criteria.
• With classroom Fatigue
teachers Bias towards “good” or
“bad” students
Carelessness
2. RELIABILITY
Fluctuations in test administration

Most commonly caused by:


• Conditions of the test administration
• Photocopying variations
• Amount of light in the room
• Variations in temperature
• Condition of desks and chairs
• Etc.
2. RELIABILITY

Fluctuations in the test itself

Most commonly caused by:


• Tests that are too long, causing fatigue
on test-takers
• Effect of timed tests on some students
• Poorly written test items (ambiguous)
3. VALIDITY
“ The extent to which inferences made from assessment results
are appropriate, meaningful and useful in terms of the purpose
of the assessment” (Gronlund, 1998, p. 226)

Content- Related Validity

Types of Criterion- Related Validity


Validity
Construct- Related Validity
Evidence
in a test Consequential Validity

Face Validity
3. VALIDITY
Content- Related Validity

A test has content validity when:


• It samples the subject matter being taught.
• It requires the learner to perform the behavior being measured.
• It clearly measures the expected achievement.

Situation: The teacher programs a reading comprehension test.

1. Teacher asks students to underline unknown words and match them to a given
definition.

2. Teacher asks students to identify examples of Simple Past actions and write new
sentences using that tense.

3. Teacher asks students to write a short opinion on the topic of the text.

Which of the three has content validity?


3. VALIDITY
Criterion- Related Validity
A test has criterion validity when:
• The criterion of the test has actually been reached

A classroom test designed to assess mastery of a function of the language will have
criterion validity if test scores are corroborated either by observed subsequent
behavior or by other communicative measures of the function in question.

Example:

Students are subjected to a test on comparisons. Marks are good but teacher
notices students are unable to compare things in communicative situations in the
course of future lessons.

Did the test administered have criterion validity?


3. VALIDITY
Construct- Related Validity
A Construct is:
• A theory, hypothesis or model that attempts to explain observed
phenomena.
Proficiency and Communicative Competence are linguistic constructs.

The question to be asked to check Construct- Validity is:

Does this test tap into the theoretical construct as it has been defined?

Example:
You have created a simple written vocabulary test, covering the content of a recent
unit, that asks students to correctly define a set of words. Your chose items maybe a
perfectly adequate sample of what was covered in the unit, but if the lexical objective
of the unit was the communicative use of vocabulary, then the writing of definitions
certainly fails to match a construct of communicative language use.
3. VALIDITY
Consequential Validity

It involves all the consequences of a test:


• Its accuracy in measuring intended criteria.
• Its impact on the preparation of test-takers.
• Its effect on the learner (*)
• The (intended and unintended) social consequences of a test´s
interpretation and use.

(*): Gronlund (1998, pp. 209-210) encourages teachers to consider the effect of
assessments on students' motivation, subsequent performance in a course, independent
learning, study habits and attitude toward school work.
3. VALIDITY
Face Validity

It refers to:
The degree to which a test looks right and appears to measure the
knowledge or abilities it claims to measure, based on the subjective
judgement of the examinees who take it, the people who decide on its
use and other observers.

Face Validity means that the students perceive the test to be valid.
Face Validity asks the question:
Does the test, on the face of it, appear from the learner´s
perspective to test what it is designed to test?
3. VALIDITY
Face Validity

When do you, as
student, consider a
test to be valid?
3. VALIDITY
Face Validity
Face validity will likely be high if learners encounter:

A well constructed, expected format with familiar tasks

A test that is clearly doable within the allotted time limit

Items that are clear and uncomplicated

Directions that are crystal clear

Tasks that relate to their course work (content validity)

A difficulty level that presents a reasonable challenge


Practicality, Reliability, Validity

✦ If in your language teaching you can attend to the


practicality, reliability and validity of tests of
language, whether those tests are classroom tests
related to a part of a lesson, final exams or
proficiency tests, then you are well on the way to
making accurate judgements about the
competence of the learners with whom you are
working
PAIR WORK: Consider the following quiz on English articles for a
high-beginner level of a conversation class (listening and
speaking) for English learners
4. AUTHENTICITY
“ The degree of correspondence of the characteristics of a given
language test task to the features of a target language task”
(Bachman and Palmer, 1996, p. 23)

The language in the test is as natural as


possible

Items are contextualized rather than isolated


Authenticity
may be Topics are meaningful (relevant, interesting)
present in a for the learner
test when …
Some thematic organization to items is
provided, such as through a story line or
episode

Tasks represent, or closely approximate,


real-world tasks
5. WASHBACK
“ The effect of testing on teaching and learning”
(Hughes, 2003, p. 1)

The information that washes back to


students in the form of useful
diagnoses of strengths and
Forms of weaknesses
Washback
may
include … The effects of an assessment on
teaching and learning prior to the
assessment itself, that is, on
preparation for the assessment
5. WASHBACK

 Informal performance assessment is by nature more


likely to have built-in washback effects because the
teacher is usually providing interactive feedback.

 Formal tests can also have positive washback, but


they provide no washback if the students receive a
simple overall numerical score
5. WASHBACK
• Correct answers need to be praised
The challenge to
• Incorrect answers can provide
teachers is to create
insight into further work
classroom tests that
serve as learning devices
• Teachers can suggest strategies for
through which washback success
is achieved.

Washback enhances a number of principles of language acquisition:


Intrinsic Motivation, Autonomy, self-confidence, language ego,
Interlanguage, etc.
TEST ANALYSIS
Applying principles to the evaluation
of classroom tests
Practical work
✦ Is it a valid test?
✦ Type of assessment
✦ Is it a reliable test?
✦ Norm-referenced
or criterion- ✦ Is it an authentic
referenced test? test?
✦ Type of test? ✦ Is there any
washback effect
✦ Is it a practical test? you may predict?

You might also like