Ela Paper of Group 4

THE PRINCIPLE OF LANGUANGE
ASSESSMENT
Submitted by:
Group 4
Elfika Amanda (170203157)
Rena Angguntia (170203029)
Sulthan Maulidan (170203131)
Subject: ENGLISH LANGUAGE ASSESSMENT
Lecture: Erry Zul Akbar, S.Pd.i, M.Pd
ENGLISH LANGUAGE EDUCATION DEPARTMENT

EDUCATION AND TEACHER TRAINING FACULTY
AR-RANIRY STATE ISLAMIC UNIVERSITY
DARUSSALAM - BANDA ACEH
2020 M / 1441 H
PREFACE
Peace be upon you, and Allah's mercy and blessings
Praise be to Allah SWT for giving us convenience so that we can complete this
paper on time. Without His help, of course, we would not be able to finish this paper
well. Shalawat and greetings may be abundantly bestowed upon our beloved king, the
Prophet Muhammad, whom we will later turn to his shari'a in the hereafter.
The author thanks God Almighty for the abundance of His healthy favors, both in
the form of physical health and reason, so that the author is able to complete the making
of an English language assessment paper with the title THE PRINCIPIPLE OF
LANGUAGE ASSESSMENT..
The author certainly realizes that this paper is far from perfect and there are still
many mistakes and flaws in it. For this reason, the author expects criticism and
suggestions from readers for this paper, so that this paper can later become a better paper.
Then if there are many errors in this paper the authors apologize profusely.
Thus, hopefully this paper can be useful. thanks.
Group 4
TABLE OF CONTENT
Preface ....................................................................................................................................
Table of Content.....................................................................................................................
Chapter I.................................................................................................................................
A. Background of Study..............................................................................................
B. Problem of study ...................................................................................................
................................................................................................................................
C. Aim of Paper .........................................................................................................
Chapter II................................................................................................................................
A. Practicality of a test ...............................................................................................
B. Reliability of a test ................................................................................................
a. Test administration reliability
b. Student reliability
c. Marker ability
d. Test reliability
C. Characteristic of Dick and Carey Model ...............................................................
a. Construct validity
b. Content Validity
c. Face Validity
d. Criterion Validity
Chapter III
A. Summary ...............................................................................................................
................................................................................................................................
B. Suggestion .............................................................................................................
Bibliography ...........................................................................................................................
CHAPTER I
INTRODUCTION
A. Background
This paper explores how principles of language assessment can and

should be
applied to formal tests, but with the recognition that these principles
also
apply to ·assessments of all kinds including oral and written, these
principles will be used to evaluate an existing, previously published, or
created test. These principles are established into 5 aspects, they are:
1.practicality
2.reliability
3.validity
4.authenticity
5.washback
B. Problem Formulation
To clarify the direction of the study in this paper, the following problem
statements are arranged.
1. What are the principles of language assessment?

2. What do practicality, validity, and reliability mean?
3. What are the types of validity and reliability?
4. How could we determine reliability and validity on a test?
C. Aim of Paper
1. To know what the principles of language assessment

2. To know what practicality, validity, and reliability are
3. To know the types of validity and reliability
4. To be able to determine the reliability and validity of a test
CHAPTER II
A. Practicality of a test
Practicality of the test means that the test has to be as easy as it could
possibly be to be practiced. It refers to the management of time, effort, and
money in testing. In other words, a test should be easy to be designed, easy
to be conducted, easy to be marked and must be valid enough. For instance,
a test that is extremely expensive is impractical because student may not be
able to spent their money that much. A test of language proficiency that
takes a student five hours to complete is impractical because it consumes
more time than necessary to accomplish its objective. A test that requires
individual one--on--one proctoring is impractical for a group of several
hundred test-takers and only a handful of examiners. A test that takes a few
minutes for a student to take and several hours for an examiner to evaluate
is impractical for most classroom situations. In brief, an easy test should be
provided with an easy rubric scoring as well. A test that can be scored only
by computer is impractical if the test takes place a thousand miles away
from the nearest computer. (Douglas, 1994, p.19)
In the term of administrations, the test should be suitable to the test-

takers surrounding, culture, and area. Students at rural area should not be
given a computer-based test which requires them to at least know how to
operate a computer. In this case, could these students be able to complete
the test without knowing how to interact with the computer.? The test must
also be valid, which basically means a test must aim to provide a
measurement of the particular skill and not go astray from what is meant to
be measured. This will be discussed further in the validity of the test.
B. Reliability of a test
Reliability refers to how consistently a test measures a particular characteristic. If a
person takes the test again, she/he should result a score that is not really far apart from the
previous (It should not be too high or too low either). To be exact, there are some factors that
affect the reliability.
a. Test-administration reliability.
The very location of the test administration can be a main source of measurement error if
it affects the performance of the students. For instance, the possible effects of administering a
test to a group of students in a quiet library with other people in it, as opposed to administering it
in a quiet auditorium that contains only examinees and proctors. Clearly, the difference in
surroundings could cause some variance in test scores that is not related to the purpose of the
test. Similarly, the amount of space available to each student can become a factor. And noise can
be a factor that will affect the performance of students, particularly on a listening comprehension
test, but also on other types of tests if the noise distracts the students from the items at hand.
Indeed, lighting, ventilation, weather, or any other environmental factors can serve as potential
sources of measurement error if they affect the students’ performances on a test.
b. Student reliability.
A large number of potential sources of error variance are directly related to the condition
of the students when they take the test. T h e sources include physical characteristics like
differences among students in their fatigue, health, hearing, or vision. For example, if five
students in a class are coming down with the flu at the time that they are taking a test, their poor
physical health may be a variable that should be considered as a potential source of measurement
error. Depending on the tasks involved on a test, color blindness or other more serious physical
differences could also become important sources of measurement error.
Other factors include differences among students (or in individual students over
time) in motivation, emotional state, memory, concentration, forgetfulness,
impulsiveness, carelessness, and so forth.
The experience of students with regard to test taking can also affect their performances.
This experience includes the ability to comprehend easily almost any test directions, or strategies
for maximizing the speed of task performance. some of the students may have topic knowledge
that will help them with certain questions on a test in a way that is not related to the purpose of
the test. By and large, the issues related to the condition of the students are the responsibility of
the students themselves; however, testers must be aware that they are potential sources of
measurement error and must attempt to minimize their effects.
c. Marker reliability.
Factors over which testers have considerably more control are related to the scoring
procedures used. Human errors in doing the scoring are common source of measurement error.
Another source may occur in any of the more subjective types of tests, for example, in
composition and interview ratings. The problem is that the nature of the scoring procedures can
lead to evaluator inconsistencies and flaws which affect on the student scores. For instance, if a
rater is affected positively or negatively by the sex, race, age, or personality of the interviewee,
these biases can contribute to measurement error. Perhaps one composition rater is simply
tougher than the others. Then a student’s score is affected by whether or not the rating is done
by this particular rater.
d. Test reliability.
some of the students are familiar with the format while others are not. Item selection may
also become an issue if the particular sample of items chosen is odd or out of of the purpose of
the test. The type of items chosen can also be an issue if that type is new to some of the students
or is a mismatch with the purpose of the test. The number of items used on a test is also a
potential source of measurement error. If only a small number of items is used, it is known that
the measurement will not be as accurate as for a larger number of items. For instance, a 30-item,
multiple-choice test will clearly measure more accurately than a 1-item test. Once that premise is
accepted, differences in the accuracy of measurement for other numbers of items simply become
a matter of degrees. The quality of the items can also become a source of measurement error, if
that quality is poor or uneven. Lastly, test security can become an issue, particularly if some of
the students have managed to get a copy of the test before and prepared for that particular set of
questions All the foregoing sources of measurement error could affect students’ scores on any
given test. Such effects are undesirable because they are creating a big gap in the students’ scores
that is unrelated to the purpose of the test. Therefore, every effort must be made to minimize
these effects. As Cronbach (1970) pointed out, “Test theory shows how to estimate the effects
of unwanted influences and permits judgments about the relation between the actual score and
the score that could be obtained by thorough measurement.”
C. Validity of test
Validity refers to the degree in which our test or other measuring device is truly
measuring what we intended it to measure. The test question “1 + 1 = ___” is certainly a valid
basic addition question because it is truly measuring a student’s ability to perform basic addition.
It becomes less valid as a measurement for advanced mathematic because it addresses some
required further knowledge of mathematic, it does not represent all of knowledge required for an
advanced understanding of addition.
The concept of validity becomes more complex. Most of us agree that “1 + 1 = ___” would
represent basic addition, but does this question also represent the aspects of intelligence? Other
aspects include motivation, depression, anger, and trait. If we have a difficult time defining the
construct, we are going to have an even more difficult time measuring it. Construct validity is
the term given to a test that measures a construct accurately and there are different types of
construct validity that we should be concerned with. Three of these, concurrent validity, content
validity, and face validity are discussed below.
a. Construct validity
Constructs can be characteristics of individuals, such as intelligence, obesity, job
satisfaction, or depression; they can also be broader concepts applied to organizations or social
groups, such as gender equality, corporate social responsibility, or freedom of speech.
Construct validity is about ensuring that the method of measurement matches the
construct that teacher wants to measure. To achieve construct validity, teacher have to ensure
that indicators and measurements are carefully developed based on relevant existing knowledge.
The questionnaire must include only relevant questions that measure known indicators of what is
meant to be measured.
b. Content validity
Content validity assesses whether a test is representative of all aspects of the construct or
not. To produce valid results, the content of a test, survey or measurement method must cover all
relevant parts of the subject it aims to measure. If some aspects are missing from the
measurement (or if irrelevant aspects are included), the validity is threatened.
Relevant case:
A mathematics teacher develops an end-of-semester grammar test for her class.

The test should cover every form of grammar that was taught in the class. If some
types of grammar are left out, then the results may not be an accurate indication of
students’ understanding of the subject. Similarly, if she includes questions that are
not related to grammar, the results are no longer a valid measure of algebra
knowledge.
c. Face validity
Face validity refers to whether a test appears to be valid or not i.e., from external
appearance whether the items appear to measure the required aspect or not. If a test measures
what the test author desires to measure, we say that the test has face validity. Thus, face validity
refers not to what the test measures, but what the test ‘appears to measure’. The content of the
test should not obviously appear to be inappropriate, irrelevant.
For example, a test to measure “Skill in speaking” should contain only items on speaking.
When someone goes through the items and feels that all the items appear to measure the skill in
speaking, then it can be said that the test is validated by face.
d. Criterion validity
Criterion validity evaluates how closely the results of your test correspond to the results
of a different test. To evaluate criterion validity, you calculate the correlation between the results
of your measurement and the results of the criterion measurement. If there is a high correlation,
this gives a good indication that your test is measuring what it intends to measure.
Relevant case:
A university professor creates a new test to measure applicants’ English writing

ability. To assess how well the test really does measure students’ writing ability,
she finds an existing test that is considered a valid measurement of English
writing ability, and compares the results when the same group of students take
both tests. If the outcomes are very similar, the new test has a high criterion
validity.
CHAPTER III
A. Summary
It is necessary that teachers examine those principles in order to

create a good and meaningful test. The principles of language
assessment can and should be
applied to formal tests, but with the recognition that these principles
also
apply to ·assessments of all kinds including oral and written, these
principles will be used to evaluate an existing, previously published, or
created test. These principles are established into 5 aspects, they are:
1.practicality
2.reliability
3.validity
4.authenticity
5.washback
Bibliography
Douglas brown, Language assessment and Classroom practice, 1994
Jean Dean Brown, Testing in Language Programs, 1996
James Boyle, Stephen Fisher, Educational Testing, 2007
http://www.yourarticlelibrary.com/statistics-2/determining-reliability-of-a-test-4-
methods/92574
https://opentextbc.ca/researchmethods/chapter/reliability-and-validity-of-measurement/

Ela Paper of Group 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ela Paper of Group 4

Uploaded by

Copyright:

Available Formats

THE PRINCIPLE OF LANGUANGE

Subject: ENGLISH LANGUAGE ASSESSMENT

Lecture: Erry Zul Akbar, S.Pd.i, M.Pd

ENGLISH LANGUAGE EDUCATION DEPARTMENT

Peace be upon you, and Allah's mercy and blessings

Thus, hopefully this paper can be useful. thanks.

This paper explores how principles of language assessment can and

1. What are the principles of language assessment?

1. To know what the principles of language assessment

In the term of administrations, the test should be suitable to the test-

A mathematics teacher develops an end-of-semester grammar test for her class.

A university professor creates a new test to measure applicants’ English writing

It is necessary that teachers examine those principles in order to

Jean Dean Brown, Testing in Language Programs, 1996

James Boyle, Stephen Fisher, Educational Testing, 2007

You might also like