Professional Documents
Culture Documents
Characterisitcs of A Test
Characterisitcs of A Test
Roll No. 01
Class: MA English
Semester: IV
Session: 2013-2015
Now testing is the part and parcel of human life; in general and education
system; in particular. Therefore tests should have certain qualities so that
these tests represent the true nature of the capabilities of the candidates.
Following are the most important characteristics of the tests.
Reliability
Validity
Practicality
Discrimination
Backwash effect
Objectivity
Comprehensiveness
Authenticity
Simplicity
Scorability
Reliability:
Reliability is one of the most important characteristics of all tests in general, and
language tests in particular. In fact, an unreliable test is worth nothing. In order
to understand the concept of reliability, an example may prove helpful. Suppose
a student took a test of grammar comprising one hundred items and received a
score of 90. Suppose further that the same student took the same test two days
later and got a score of 45. Finally suppose that the same student took the same
test for the third time and received a score of 70. What would you think of these
scores? What would you think of the student? Assuming that the student’s
knowledge of English cannot go under drastic changes within this short period,
the best explanation would be that there must have been something wrong with
the test. How would you rely on a test that does not produce consistent scores?
How can you make a sound decision on the basis of such test scores? This is the
essence of the concept of reliability, i.e., producing consistent scores.
Reliability, then, can be technically defined as
“The extent to which a test produces consistent scores at different
administrations to the same or similar group of examinees”
Reliability is “the extent to which a test produces consistent scores." This
means that the higher the extent, the more reliable the test.
There are four methods of estimating Reliability.
Test-Retest.
Parallel forms
Split-half.
KR-21
1. Test-Retest Method:
As the name implies, in this method a single test is administered to a single
group of examinees twice. The first administration is called “test” and the
second administration is referred to as “retest”. The correlation between the two
sets of scores obtained from testing and retesting, would determine the
magnitude of reliability. Since there is a time interval (usually more than two
weeks) between the two administrations, this kind of reliability estimate is also
known as “stability of scores over time”.
Therefore, most teachers and test developers avoid this method. Due to
the complexity of the task, they prefer to use other methods of estimating
reliability
3. Split-Half Method:
In test-retest method, one group of examinees was needed for two
administrations. In parallel forms method, on the other hand, two forms of a
single test were needed.
Each of these requirements is considered a disadvantage. To obviate these
shortcomings, the split-half method has been developed. In this method, a single
form of a test is given to a single group of examinees. Then each examinee’s
test is split (divided) into two halves. The correlation between the scores of the
examinees in the first half and the second half will determine the reliability of
the test scores.
Advantages of this Method:
It is more practical than others.
There is no need to administer the test twice.
There is no need to develop two parallel forms of the test.
Influence of Testees
Influence of scorers/Raters
Influence of test factors
Influence of Testees:
As human beings are dynamic ,so the performances of students may vary
from place to place and time to time , may be due to
Noise level
Distractions
Misreading of instructions
Sickness
Heterogeneity of the group of students may influence Reliability.
Sources of Errors:
Intra-rater errors:
Errors, that are due to fluctuations of the raters scoring the single test
twice
Inter-rater errors:
Errors that are due to fluctuations of different scorers, scoring a single
test.
INTER-RATER RELIABILITY :
Inter-rater reliability is the extent to which two or more individuals (coders or
raters) agree. Inter-rater reliability assesses the consistency of how a measuring
system is implemented.
For example, when two or more teachers use a rating scale with
which they are rating the students’ oral responses in an interview.
Inter-rater reliability is dependent upon the ability of two or more
individuals to be consistent. Training, education and monitoring
skills can enhance inter-rater reliability.
. INTRA-RATER RELIABILITY:
Intra-rater reliability is a type of reliability assessment in which the same
assessment is completed by the same rater on two or more occasions. These
different ratings are then compared, generally by means of correlation. Since the
same individual is completing both assessments, the rater's subsequent ratings
are contaminated by knowledge of earlier ratings.
Validity:
1.Face Validity:
Face validity refers to
“The extent to which the physical appearance of the test corresponds to what
it is claimed to measure.”
. For instance, a test of grammar must contain grammatical items and not
vocabulary items. Of course, a test of vocabulary may very well measure
grammatical ability as well; however, that type of test will not show high face
validity. It should be mentioned that face validity is not a very crucial or
determinant type of validity. In most cases, a test that does not show high face
validity has proven to be highly valid by other criteria. Therefore, teachers and
administrators should not be very much concerned about the face validity of
their tests. They should however be careful about other types of validity.
Example:
People may have negative reactions to an intelligence test that did not appear to
them to be measuring their intelligence.
2. Content Validity:
Content validity refers to
“The correspondence between the contents of the test and the contents of the
materials to be tested.”
Content Validity consists of:
Instructional Objectives.
Instructional Activities.
Test.
Of course, a test cannot include all the elements of the content to be tested.
Nevertheless, the content of the test should be a reasonable sample and
representative of the total content to be tested. In order to determine the content
validity of a test, a careful examination of the direct correspondence between
the content of the test and the materials to be tested is necessary.
3. Criterion-Related Validity:
Criterion-related validity refers to the correspondence between the results of
the test in question and the results obtained from an outside criterion. The
outside criterion is usually a measurement device for which the validity is
already established. In contrast to face validity and content validity, which are
determined subjectively, criterion-related validity is established quite
objectively.
That is why it is often referred to as empirical validity. Criterion-related validity
is determined by correlating the scores on a newly developed test with scores on
an already-established test.
Example:
An IQ test should relate positively with school performance.
Concurrent Validity:
If a newly developed test is given concurrently with a criterion test, the validity
index is called concurrent validity.
Predictive Validity:
When the two tests are given within a time interval, the correlation between two
sets of scores is called predictive validity.
CONSTRUCT VALIDITY:
It implies using the construct correctly (concepts, ideas, notions). Construct
validity seeks agreement between a theoretical concept and a specific measuring
device or procedure.
For example, a test of intelligence nowadays must include measures of
multiple intelligences, rather than just logical-mathematical and linguistic
ability measures.
Note:
Both concurrent and predictive validity indexes serve the purposes of
prediction. The magnitude of correlation predicts the performance of the
examinees on the criterion measure from their performance on a newly
developed test or vice-versa.
Importance of Validity:
It is important for the test to be valid in order for the results to be accurately
applied and interpreted.
External Validity:
It is the extent to which the results of a study can be generalized to different
situations, different groups of people, different settings, and different
conditions.
Internal Validity:
It is the extent to which a study is free from flaws and that any differences in the
measurements are due to an independent variable and nothing else.
Population Validity:
It refers to the extent to which a study can be generalized to the populations of
the people.
Ecological Validity:
It refers to the extent to which the findings can be generalized beyond the
present situation.
Likewise, results from a test can be reliable and not necessarily valid.
Practicality:
. It refers to facilities available to test developers regarding both administration
and scoring procedures of a test .It is defined as:
“Practicality refers to the ease of administration and scoring of a test.”
As far as administration is concerned, test developers should be attentive to the
possibilities of giving a test under reasonably acceptable conditions.
Ease of administration:
It is = Clarity, simplicity and the ease of reading instructions
Fewer numbers of subtests
The time required for test
Ease of scoring:
A test can be scored subjectively or objectively. Since scoring is difficult and
time consuming, the trend is toward objectivity, simplicity and machine scoring.
Ease of Interpretation:
It refers to the meaningfulness of scores obtained from that test if the test
results are misinterpreted or misapplied, they will be of little value and may
actually be harmful to some individual or group.
Main Points related to Practicality:
Easy to administer
Easy to mark
Discrimination:
For example, 70% means nothing at all unless all the other scores obtained in
test are known. Furthermore, tests on which all the candidates score 70% fail to
discriminate between the various students.
COMREHENSIVENESS
It covers all the items that have been taught or studied. It includes items
from different areas of the material assigned for the test so as to check
accurately the amount of students’ knowledge.
BALANCE:
It tests linguistic as well as communicative competence and it reflects
the real command of the language. It tests also appropriateness and
accuracy.
OBJECTIVITY:
If it is marked by different teachers, the score will be the same. Marking
process should not be affected by the teacher’s personality. Questions
and answers are so clear and definite that the marker would give the
students the score he/she deserves.
ECONOMICAL:
It makes the best use of the teacher’s limited time for preparing and
grading and it makes the best use of the pupil’s assigned time for
answering all items. So, we can say that oral exams in classes of +30
students are not economical as it requires too much time and effort to be
conducted.
AUTHENTICITY
The language of the test should reflect everyday discourse.
SIMPLICITY
Simplicity means that the test should be written in a clear, correct and
simple language, it is important to keep the method of testing as simple
as possible while still testing the skill you intend to test.
SCORABILITY
Scorability means that each item in the test has its own mark related to
the distribution of marks given by, The Ministry of Education.
Backwash Effect:
LEVELS OF BACKWASH :
The micro level (the effect of the test on individual students and teachers)
The macro level (the impact of the test on society and the educational
system)
Conclusion:
These are the characteristics of the tests that are very necessary so that the test
results be reliable, valid, objective, applicable, practical, and effective in
evaluation of the candidates in every field of human activity and especially in
education.