You are on page 1of 13

Criteria of Good Test: Validity

This paper is presented to fulfill the assignment of English Instructional Evaluation

Lecturer: Pryla Rochmahwati, M.Pd

by:

The 4th Group of TBI D

Herwin Tri Ananda 204180097


Luthfiatul Azizah Nuril Anwar 204180104
Muhammad Fajar Eko Saputro 204180109
Nadiya Dwi Cahyani 204180116
Ni’ma Nurin Namiroh 204180120

ENGLISH EDUCATION DEPARTMENT


FACULTY OF EDUCATION AND TEACHING TRAINING
INSTITUTE OF ISLAMIC STATE PONOROGO
2020
PREFACE

All praises to Allah SWT, the creator and the protector of this universe. Because of
his charity, guidance, and blessing, the writer could finish this paper arrangement. Please and
salutation always to be upon noble prophet Muhammad SAW who has quit us from the
darkness to the lightness.

This paper discussed about semantics. The writing of this paper is based on materials
which we got from various sources. We make it with steps and systematic methods so it can
be understood easily.

We realize that this paper is so far from perfect and needs a lot of improvement. So
that criticisms and suggestions for making better are needed in writing this paper. We hope it
will be useful for us and the readers to improve our knowledge.

Ponorogo, September 20th 2020

Authors

ii
TABLE OF CONTENTS

Preface ..................................................................................................................................... ii
Table of Contents................................................................................................................... iii
Chapter I : Introduction
A. Background .......................................................................................................................1
B. Problem Formulations........................................................................................................1
C. Purpose...............................................................................................................................1
Chapter II : Discussion
A. The Definition of Validity ................................................................................................2
B. Content Validity ................................................................................................................2
C. Empirical Validity .............................................................................................................4
D. Factor Influencing Validity ..............................................................................................6
Chapter III : Closing
A. Conclusion ........................................................................................................................7
B. Suggestion .........................................................................................................................7
References ................................................................................................................................8

iii
iv
CHAPTER I

INTRODUCTION

A. Background
Language testing has been defined as one of the core areas of applied
linguistics because it tackles two of its fundamental issues: the need to define and
reflect on the appropriateness of Second Language Acquisition models and constructs
by data analysis from language tests and the importance of facing the ethical
challenge deriving from the social and political role language tests play nowadays.
Language testing has thus a twofold impact in a variety of contexts. In the first
instance, it constitutes a scientific impulse for which research is needed to develop
and implement the technical design of tests. Secondly, language testing has also
become the subject of debating because the use and interpretation of test results
introduce ethical issues concerning the concept of ‘fairness’ in the construction,
administration, evaluation, and interpretation of language tests.

In fact, language tests are always designed and used to make decisions on the
basis of a process in which information about test takers is gathered from an observed
performance under test conditions. This inevitably leads to the development of codes
of ethics in educational testing environments and to the elaboration of theories of
validity and validation.

Therefore, this paper attempts to show us why we need to know about validity.

B. Problem Formulation
1. What is the definition of validity?
2. What is content validity?
3. What is empirical validity?
4. What kind of factors influencing validity?

C. Purpose
1. To know the definition of semantics.
2. To know the content validity.
3. To know the empirical validity.
4. To know the factors influencing validity.

1
2
CHAPTER II

DISCUSSION

A. The Definition of Validity

Validity refers to the ability of the test to measure what it purports to measure.
Validity in language testing has traditionally been understood to mean discovering
whether a test ‘measures accurately what it is intended to measure’ (Hughes,
1989:22), or uncovering the ‘appropriateness of a given test or any of its component
parts as a measure of what it is purposed to measure’ (Henning, 1987:170). Heaton
(1975:153) defines the validity of a test as “the extent to which it measures what it is
supposed to measure and nothing else”. Harris (1969) defines validity with reference
to two questions: “(1) What precisely does the test measure? And (2) How well does
the test measure?”

To use tests wisely we need information about what types of inferences can
reasonably be made from test scores. This is a matter of validity, which refers to the
appropriateness, meaningfulness, and usefulness of the specific inferences made
from test scores. Test validation is the process of accumulating evidence to support
such inferences (APA 1985:9). In order to support or justify the inferences we make
about the quality or qualities of the test takers, we must first clearly define the
construct, and then we need to develop an argument that the test, the test tasks, and
the test scores are relevant not only to the construction but also to the test purpose
(Douglas, 2010). Thus, the notion of validity raises the question of the extent to
which the score is relevant and useful to any decisions that might be made on the
basis of scores, and whether the use of the test to make those decisions has positive
consequences for test takers (Fulcher, 2010).

B. Content Validity

According to Hughes: (1995:27), a test is said to have content validity if its


contents constitute and represent a sample of the language skills, structures, etc. with
which it is meant to be concerned. A test is said to have face validity if looks as if it
measures what it is supposed to measure. A valid test is a test which affords
satisfactory evidence of the degree to which the students are actually reaching the

3
desired objectives of teaching, these objectives being specifically stated in terms of
tangible behavior.

Measuring content validity of instruments is important. This type of validity


can help to ensure construct validity and give confidence to the readers and
researchers about instruments. Content validity refers to the degree that the instrument
covers the content that it is supposed to measure. For content validity, two judgments
are necessary: the measurable extent of each item for defining the traits and the set of
items that represents all aspects of the traits the study to them individually.

The Content Validity Index (CV!) developed by Wal~ and Bausell was used.
The experts were then asked to rate each item based on relevance, clarity, simplicity,
and ambiguity on the four-point scale. (Table 1)

1. Relevance 2. Simplicity

1 = not relevant 1 = not simple

2 = item need some revision 2 = item need some revision

3 = relevant but need minor revision 3 = simple but need minor revision

4 = very relevant 4 = very simple

3. Clarity 4. Ambiguity

1 = not clear 1 = doubtful

2 = item need some revision 2 = item need some revision

3 = clear but need minor revision 3 = no doubt but need minor revision

4 = very clear 4 = meaning is clear

Evaluation of content validity helps the researcher to provide reliable evidence to


ensure the inclusion of all the important aspects and key concepts in the evaluation of the
subject matter, as well as the acceptability of all the components of the tool in the view of the
expert panel.

4
C. Empirical Validity

1. Concurrent Validity

Concurrent validity is a measure of how well a particular test correlates


with a previously validated measure. It is commonly used in social science
psychology and education. The tests are for the same, or very closely related,
constructs and allow a researcher to validate new methods against a tried and
tested stalwart.

For example, IQ, Emotional Quotient, and most school grading


systems are good examples of established tests that are regarded as having a
high validity. One common way of looking at concurrent validity is measuring
a new test or procedure against a standard benchmark.

a. An Example of Concurrent Validity

Researchers give a group of students a new test, designed to


measure English aptitude. They then compare this with the test scores
already held by the school, a recognized and reliable judge of
mathematical ability. Cross-referencing the scores for each student
allows the researchers to check if there is a correlation, evaluate the
accuracy of their test, and decide whether it measures what it is supposed
to. The key element is that the two methods were compared at about the
same time. If the researchers had measured the mathematical aptitude,
implemented a new educational program, and then retested the students
after six months, this would be predictive validity.

b. The Weaknesses of Concurrent Validity

Concurrent validity is regarded as a fairly weak type of validity


and is rarely accepted on its own. The problem is that the benchmark test
may have some inaccuracies and if the new test shows a correlation, it
merely shows that the new test contains the same problems. For
example, IQ tests are often criticized, because they are often used
beyond the scope of the original intention and are not the strongest
indicator of all-round intelligence. Any new intelligence test that showed
strong concurrent validity with IQ tests would, presumably, contain the

5
same inherent weaknesses. Despite this weakness, concurrent validity is
a stalwart of education and employment testing, where it can be a good
guide for new testing procedures. Ideally, researchers initially test
concurrent validity and then follow up with a predictive validity based
experiment, to give a strong foundation to their findings.

2. Predictive Validity
Predictive validity is the degree of correlation between the scores on a
test and some other measures that the test is designed to predict. Crocker and
Algina (1986, p.224) define predictive validity as “the degree to which test
scores predict criterion measurements that will be made at some point in the
future.” In terms of a proficiency test, predictive validity refers to the extent to
which a test can be appropriately used to draw inferences regarding
proficiency. Predictive validity according to Alderson, Clapham, and Wall
(1995), can be tested for the same examinees by comparing a test score with
another measure, which is collected after the test has been given. It is common
to look for predictive validity in a proficiency test because predictive validity
analyses are important in checking whether the main objective of the
proficiency exam, which is to evaluate an examinee’s ability to successfully
perform in a future course, is achieved or not. Brown (2004, p .24) asserts that
the predictive validity of an assessment becomes important in the case of
placement tests, admissions assessment batteries, language aptitude tests, and
the like. He also argues that the assessment criterion in such cases is not to
measure concurrent ability but to assess (and predict) a test – taker's likelihood
of future success.

a. An Example of Predictive Validity

The example of predictive validity is the most selective higher


education institutions have been finding that the existing school
examination system is no longer providing evidence of differences in
individual merit for the highest attaining candidates. It can be answered
by ‘predictive validity’ by calculating the correlation coefficient between
scores on the selection test and scores on an outcome variable such as
degree classification, or the score on a test at the end of the first year of
the degree course.
6
D. Factors Influencing Validity

Controlling all possible factors that threaten the research validity is the
primary responsibility of every good researcher. There are two kinds of validity are:

1. Internal Validity

It is affected by flaws within the study itself such as not controlling


some of the major variables (a design problem), or problems with the research
instrument (a data collection problem). Here are some factors which affect
internal validity:
a. Subject variability
b. Size of subject population
c. Time given for the data collection or experimental treatment
d. History
e. Attrition
f. Maturation

g. Instrument/task sensitivity

2. External Validity

It is the extent to which you can generalize your findings to a larger


group or other contexts. If your research lacks external validity, the findings
cannot be applied to contexts other than the one in which you carried out your
research. "Findings can be said to be externally invalid because [they] cannot
be extended or applied to contexts outside those in which the research took
place" (Seliger & Shohamy 1989, 95). Here are seven important factors affect
external validity:
a. Population characteristics (subjects)
b. Interaction of subject selection and research
c. Descriptive explicitness of the independent variable
d. The effect of the research environment
e. Researcher or experimenter effects
f. Data collection methodology
g. The effect of time

7
CHAPTER III

CLOSING

A. Conclusion

The extent to which the inferences or decisions we make on the basis of test
scores are meaningful, appropriate, and useful. In other words, a tes is said to be valid
to the extent that it measures what it is supposed to measures or can be used for the
purposes for which it is intended.

The matter of concern in testing is to ensure that any test employed is valid for
the purpose for which it is administered. Validity tells us what can be inferred from
test scores. Validity is a quality of test interpretation and use. If test scores are
affected by abilities other than the one we want to measure, they will not be
meaningful indicators of that particular ability.

B. Suggestion

We as the author of this paper know that this paper is still far from perfect, so
that we need the suggested from the reader for the perfection of this paper.

8
REFERENCES

Bachman, Lyle; Palmer, Adrian. 1996. Language Testing in Practice: Designing and
Developing Useful Language Tests. New York: Oxford University Press.

Brown, Douglas. 2003. Language Assesment: Principles and Classroom Practice. San
Fransisco: California.

You might also like