Paper Language Assessment

Paper Applied Linguistics : Language Assessment
Created by:
Dinda Vica Ocarina
Putri Karya Dwi B.
Language Assessment
What is Assessment?
Assessment is one of component of teaching and activities. By doing assessment, teachers can
hopefully gain information about every aspects of their students especially their achievement. An
aspect that plays crucial role in assessment is tests.A good test is constructed by considering
principles of language assessment.
They are...
- Practicality
- Validity
- Reliability
- Authenticity
- Washback
Practicality
Practicality can be simply defined as the relationship between available resources for the test i.e.
human resources, material, resources. time, etc and resources which will be required in the design,
development, and use of the test (Bachman &Palmer, 1996:35-36). Brown (2004:19) defines
practicality in terms of: cost, time, administration, and scoring/evaluation.
Cost
For the cost itself, it should be not too expensive, has to stay within the budget, and avoid excessive
budget.
For example if a teacher conducts a daily test for one class consisting of 30 students and spends IDR
500,000 for every student. Then we can call it is not practical in the terms of cost.
Time
The time should stay within appropriate time constraints and it should not be too long or too short.
Administration
The test should not be too complicated to conduct and it should be quite simple to administrate.
Scoring/Evaluation
The scoring/evaluation process should fit into the time allocation an in the process we should have
scoring rubrics, key answers, and so on to make it easy to score/evaluate.
Validity
Validity of a test is the extent, to which it exactly measures what it is supposed to be measured
(Huges, 2003:26). A test must aim to provide a true measure of the particular skill which it is intended
to measure not to the extent that it measures external knowledge and other skills at the same time
(Heaton, 1990:159). Brown (200;22-27) proposed five ways to establish validity. They are:
- Content Validity
- Criterion Validity
- Construct Validity
- Consequential Validity
- Face Validity
Content Validity
The correlation between the contents of the test and the language skills, structures, etc, with which it
is meant to be measured has to be crystal clear. The test items should really represent the course
objective.
Criterion Validity
This kind of Validity emphasizes on the relationship between the score and the outcome. The test
score should really represent the criterion that is intended to measure in the test.
Criterion Vlidity can be established through two ways
1. Concurrent Validity
A test is sad to have concurrent validity if its result is supported by other concurrent
performance beyond the assessment itself (Brown, 2004:24). For example, the validity of a high score
on the final examination of a foreign language course will be verified by the actual proficiency in the
language.
2. Predictive Validity
The predictive validity tends to assess and predict a student’s possible future success (Alderson
et al, 1995:180-183). For example, TOEFL or IELTS tests are intended to know how ell somebody will
perform the capability of his/her English in the future.
Construct Validity
Construct validity refers to concepts or theories which are underlying the usage of certain ability
including language ability. Construct validity shows that the result of the test really represents the
students which is being measured. (Djiwandoro, 1996:96).
Consequential Validity
Consequential Validity refers to the social consequences of using a particular test for a particular
purpose. The use of a test is said to have consequential validity to the extent that society benefits
from that use of the test.
Face Validity
A test is said to have face validity if it looks like to other testers, teachers, moderators, and students
as if it measure what it is supposed to measure (Heaton, 1990:159). In speaking test, for instance face
validity can be shown by speaking activities as the main activities in speaking, not anything else. The
test can be judged to have face validity by simply look at the items of the test. Note that face validity
can affect students in doing the test (Brown, 2004,27 & Heaton, 1988:160). To overcome this, the test
constructor has to consider these:
- Students will be more confident if they face a well-constructed, expected format with familiar tasks.
- Students will be less anxious if the test is clearly doable within the alloted time limit.
- Students will be optimistic if the test items are clear and uncomplicated (simple).
- Students will find it easy to do the test if the directions are very clear.
- Students will be less worried if the tasks are related to the course work (content validity).
- Students will be at ease if they difficulty level presents a reasonable change.
Reliability
Reliability refers to the consistency of the scores obtained (Gronlund, 1997:138). It means that if the
test is administered to the same students on different occasions (with no language practice work
taking place between these occasions) then it produces (almost) the same results. Reliability does not
really deal with the test itself, it deals with the results of the test. The results should be consistent.
Reliability falls into 4 kinds (Brown, 2004, 21-22), They are:

- Student-related reliability
- Rater reliability
- Test administration reliability
- Test Reliability
Student-related reliability
This kind of reliability refers to temporary illness, fatigue, a bad day, anxiety, and other physical or
psychological factors of the students. Thus, the score obtained of the students maybe his/her actual
score.
Rater reliability
Rater reliability deals with the scoring process. Factors that can affect reliability might be human
error, subjectively, and bias in scoring process. This kind of reliability falls into two categories. They
are:
- Inter-rater reliability
It occurs when two or more scores yield inconsistent scores of the same test, possibility for a
lack of attention to scoring criteria,inexperience, inattention, or even biases.
- Intra-rater reliability
It is a common occurance for classroom teacher because of unclear scoring criteria, fatigue, and
bias toward particular “good”or “bad” students or simple carelessness.
Test administration reliability
Test administration reliability concerns with the condition and situation in which the test is
administered. To increase the degree of this kind of reliability test, teachers as the administration
should consider all the things related to the test administration. For instance, if we want to conduct a
listening test, we should provide a room which is very comfortable to listening environment. The
noise from outside the room cannot enter the room. The audio system should be clear to all students.
Even, we have to consider the lighting, the condition of the desks and chairs as well.
Test Reliability
Test reliability refers to the test itself. Whether the test fits into the same constraints. It means that
the test should be not too long or too short. The items of the test should be crystal clear that it will
not end with ambiguity.
Authenticity
Authenticity deals with the “real world”. Authenticity is the degree of correspondence of the
characteristics of a given language test task to the features of a target language task (Brown,
2004:28). Teachers should construct a test items are likely to be used or applied in the real contexts of
daily life. Brown (2004:28) also proposes considerations that might be helpful to present authenticity
in a test. They are:
- The language test in the test is natural as possible
- Items are contextualized rather than isolated
- Topics are meaningful (relevant, interesting) to the learners
- Some thematic organization to items is provided, such as through a story or epsiode.
- Task represent or closely approximate, real-world tasks.
Wahsback / Backwash
The term washback is commonly used in applied linguistics. It is rarely found in dictionaries. However
the word backwash can be found in certain dictionaries and it is defined as “an effect that is not direct
result of something” by Cambridge Advanced Learner’s Dictionary. In dealing with principles of
language assessment, these two words somehow can be interchangeable. Washback (Brown:2004) or
Backwash (Heaton, 1990) refers to the influence oftesting on teaching and learning. The influence
itself can be positive or negative. (Cheng et al.(Eds), 2008:7-11)
Example of Postive washback
The national examination will require students to pay attention to the lessons more attentively,
prepare everything dealing with the national examination more thoroughly, learn the lessons by
heart, and so on. It will also require teachers to teach the lessons harder than before, give their
students extra lessons, and so on.
Example of Negative Washback
The national examination affects the teachers to give more focus on the lessons in the national exam
only and tend to ignore other lessons that do not contribute directly to pass the exam.Moreover, if
the national examination threatens the students in facing it, the students will feel anxiety about it.
They will feel like they are being under pressure.
Q & A from the presentation
Questions
1. What do you think of online learning?
2. What do you think of the washback of national examination that is being deleted nowadays?
Answers
1. Actually, we are talking about Language Assessment instead of Online Learning. But if it is about
what do we think about language assessment in online learning then, we can say that Language
learning can be done in an Online Learning. As long as we consider to the 5 principles which are
practicality, validity, reliability, authenticity, and washback, then the language assessment can be
done perfectly.
2. Well, since there is no national examination anymore then it means that there is also no washback.
Conclusion
- A test is good f it contains practicality, good validity, high reliability, authenticity, and positive
washback.
- The five principles provides guidelines for both constructing and evaluating the tests.
- Teachers should apply these five principles in constructing or evaluating tests which will be used in
assessment activities.
References
Alderson, J.C, Caroline C, Dianne W. 1995. Language Test Construction and Evaluation. Cambridge:
Cambridge University Press
Bachman, L.F, and Palmer, A.S. 1996. Language Testing in Practice: Designing and Developing Useful
Language Tests. New York: Oxford University Press
Brown, H.D. 2007. Teaching by Principles: An Interactive Approach to Language Pedagogy. New York:
Pearson Education
Cheng, L, Yoshihori W., and Andy C. (Eds.) 2008. Washback in Language Testing: Research Contexts
and Methods. New Jersey: Lawrence Erlbaum Associates
Djiwandono, M,S. 1996. Tes Bahasa dalam Pengajaran. Bandung: ITB Bandung
Fulcher, G. nad Davidson F. 2007. Language Testing and Achievement: An Advanced Resources Book.
New York: Routledge

Paper Language Assessment

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Paper Language Assessment

Uploaded by

Copyright:

Available Formats

Paper Applied Linguistics : Language Assessment

Reliability falls into 4 kinds (Brown, 2004, 21-22), They are:

Test administration reliability

Example of Postive washback

Example of Negative Washback

You might also like