You are on page 1of 24

UNIT 8

QUALITIES DESIRED IN
MEASUREMENT PROCEDURE
RELIABILITY
• a measure of how consistent our measurements are.
• Reliability is related to the scores.
– Scores that are highly reliable are accurate and can be
reproduced.
• theoretically assumed that a test score can be divided
in to two parts: a true score and an error score.
X=T+E; X = observed score
T = true score
E = error score
METHODS OF ESTIMATING SCORE
RELIABILITY
• STABILITY ESTIMATES (TEST RETEST)
• EQUIVALENT FORMS METHOD
• STABILITY AND EQUIVALENCE
• INTERNAL ANALYSIS METHODS
– Split half technique
– Kuder – Richardson techniques
– Coefficient alpha
• SCORER /JUDGE/ INTERRATER CONSISTENCY
STABILITY ESTIMATES (TEST RETEST)

• the same test is administered twice.


• have two different scores:
– in the first administration and another in the second
administration.
• Reliability is determined by calculating the
correlation between scores in the two administrations.
• useful where the traits being measured are stable. 
How long should be the interval between the two test
administration?
• The interval should be determined in relation to the
stability of the trait being measured.
EQUIVALENT FORMS METHOD

• two equivalent forms of a test will be prepared


and given to students at a time.
• a correlation between scores in the two forms
will be calculated to estimate reliability.
• If the result obtained is high correlation, then the
two tests can substitute each other.
• major limitation
– leave alone two forms constructing a single form is
not simple for classroom teachers.
– responses to retests by simple recall
STABILITY AND EQUIVALENCE

• involves the administration of two parallel


forms of a test in relatively long interval.
• minimize the problem of respondents'
responses to retests by simple recall.
• suitable for the measurement of gain or
improvement.
– a psychologist administer a personality inventory
now and later its equivalent to estimate the degree
of personality change.
INTERNAL ANALYSIS METHODS

• a test is administered only once.


• internal analysis use
– Split half techniques,
– Kuder Richardson techniques, and
– Coefficient Alpha. 
Split half technique

• the test will be divided in to two equal halves.


– a test is scoring the odd and even numbered items
separately.
– The correlation coefficient of the two sets indicates estimate
of reliability.
– The reliability estimate is based on a half length test.
– The reliability estimate of the full length is given by:

rn = n(r)/ (n-1)(r)+1;

rn = Reliability on the lengthened test


n = Number of times the test is lengthened
r = reliability on the original test scores
• For example a test has been spitted in to two
haves (using odd and even numbered items)
and the reliability based on the split half was
0.4. Then the reliability of the test scores on
the full length test is given as
Kuder – Richardson Techniques

• developed by Kuder and Richardson ,1937


• these people developed a number of formulae, two of the most
widely accepted and used are K-R 20 and K-R 21
• K- R20 is given as
r 
K
1 
 pq   
2
K 1  S 
k = number of test items
p = Proportion of correct responses to a particular item
q = Proportion of incorrect responses to a particular item
S2= variance of scores on the entire test scores
• KR 20 is used with dichotomously scored test items.
• This formula is conceptually the average correlation
achieved by computing all possible split half correlations for
a test
KR-21 is given as
K  X (K  X ) 
r   1 2 
K 1  KS 
X= Mean of scores on the test
k = number of test items
S2= variance of scores on the entire test scores
• KR-21 is based on the assumption that
– the test items do not markedly vary in difficulty.
– does not use data on difficulty level of items
– pq can be obtained from the test mean and the
number of items
Stude Total
nts 1 2 3 4 5 6 7 8 9 10 score

1 0 1 1 1 1 1 1 1 1 1 9
2 0 0 1 0 0 1 1 1 0 0 4
3 1 0 0 1 0 1 0 1 1 1 6
4 1 1 0 1 0 1 0 1 1 1 7
5 0 1 1 1 1 0 0 0 0 1 5
6 1 1 1 1 1 0 0 0 1 0 6
7 1 1 1 1 1 0 0 0 1 0 6
8 1 1 1 0 0 1 1 1 1 1 8
9 1 1 1 0 0 0 1 1 1 1 7
10 0 1 0 0 0 1 0 0 1 0 3
pi
qi
Coefficient Alpha
• This reliability estimate works with scores that
can take different values other than 0 and 1.
• used for attitude scales.
• In an attitude scale with alternatives strongly
agree, agree, undecided, disagree, and strongly
disagree which are scored 5,4,3,2,1,
respectively neither KR-20 nor KR-21 works.
• in such a case you use coefficient alpha.
Coefficient alpha is given as


K 
1 
 Si 2 

K 1  Sx 2 

K = the number of items in the test


Si2 = variance of scores on each item
Sx2 = Variance of scores on the test
SCORER /JUDGE/ INTER RATER CONSISTENCY

• This reliability estimate is especially useful


with subjective type of test items.
• It is given as

Percent Exact Agreement = 100 × [EXi]/n


– Xi = Number of testees un on which agreement (same
scoring) was reached by judges for a given item
– N = Number of testees whose essay items are being
analyzed.
FACTORS INFLUENCING RELIABILITY

TEST RELATED FACTORS


– test length,
– difficulty of test items and
– score variability
EXAMINEE RELATED FACTORS
– nature of the group tested,
– student testwiseness and
– student motivation
ADMINISTRATION RELATED FACTORS
– time limits and
– cheating opportunities
VALIDITY
• refers to the accuracy with which the scores
measure a particular ability of interest. 
• reliability is a necessary ingredient of validity
but it is not sufficient to guarantee validity.
• Invalid scores may yield a high index of
reliability.
METHODS OF ESTIMATING VALIDITY

• content related validity,


• criterion related validity,
• construct related validity and
• face validity
CONTENT RELATED VALIDITY

• the items in a given test should be a


representative samples of a universe of content in
a subject/ course. This relates to content related
validity.
• Content validity relates also to the degree of
correspondence between the test item and the
objectives to be measured.
• Content validity can be achieved if teachers
depend on table of specification when they
prepare tests
CRITERION RELATED VALIDITY

• a criterion measure is an accepted standard against which


some test is compared to validate the use of the test as a
predictor. 
• Criterion related evidence has two forms:
– concurrent validity and
– predictive validity.
• Concurrent validity evidence is collected to make sure two
different tests measure the same thing effectively.
• Predictive validity evidence is collected to make sure a
given test adequately predicts future performance.
• Both concurrent and predictive validity evidences are
mathematically determined with the use of correlation
coefficients.
CONSTRUCT RELATED VALIDATION EVIDENCE

• refers to collecting evidence if a test measures a


construct it claims to measure. 
– construct refers to a psychological construct, a theoretical
conceptualization of aspect of human behavior
• involves the use of both content related evidence and
criterion related evidence.
• the following could be done among other things
1. Systematically define the domain that are attached to
validity. (procedure in content validation)
2. Determine to what extent scores on this test correlate
with scores from other measures.  (procedure in
criterion validation).
FACE VALIDITY

• refers to the degree to which a measurement


instrument appears to measure what it is intended
to measure
• may not be as important as content validity,
criterion related validity or construct related
validity
• all validity types are not equally relevant for
different types of tests with different purposes.
– For classroom tests, because objectives and contents
can be clearly spelled out content validity sounds
more.
FACTORS INFLUENCING VALIDITY

• The following are some major ones.


1. Difficulty level of items in the predictor
test.
2. Reliabilities of both the predictor and
the criterion.
3. Nature of the group tested
THE END

You might also like