You are on page 1of 28

VALIDITY AND

RELIABILITY
Validity

 refers to the issue of whether an indicator


that is developed to measure a concept
really gauges that concept.
 the degree to which a test or measuring
instrument measures what it intends to
measure.
TYPES OF VALIDITY
1. Face Validity
- it involves an analysis of whether an instrument is
using a valid scale
- The researcher determines face validity by looking at
the features of the instrument. It includes the size of the
font or typeface, spacing, size of the paper used, and
other necessary details that will not distract the
respondents while answering the questionnaire.
2. Content Validity
- the extent to which the content or topic of the
test is truly representative of the content of the
course
-“the measure reflects the content of the
concept in question”. This may be undertaken by
asking other people whether the measure seems
to be getting at the concept that is the focus of
the attention.
Example: A researcher wishes to validate a questionnaire
in Science. He requests experts in Science to judge if the
items measure the knowledge, skills, and values supposed
to be measured.

Another way of testing content validity is for the experts to


check if the test items or questions represent the
knowledge, skills, and values suggested in the Science
course content.
3. Concurrent Validity

- the degree to which the test agrees or correlates with a


criterion set up as an acceptable measure. The criterion is
always available at the time of testing. It is applicable to test
employed for the diagnosis of existing status rather than for
the prediction of future outcome.
- it employs a criterion on which cases are known to differ
and that is relevant to the concept in question
Example: A researcher wishes to validate a
Mathematics achievement test he has constructed. He
administers this test to a group of Mathematics
students. The result of this test is correlated with an
acceptable Mathematics test which have been
previously proven as valid. If the correlation is “high”,
the Mathematics test he has constructed is valid.
4. Predictive Validity

- uses future criterion measure rather than


a contemporary one.
- it refers to how well the test predicts some
future behavior of the examinees.
Example: Suppose the researcher wants to
estimate how well a high school student may be
able to do in college course on the basis of how
well he has done on tests he took in high school
subjects
5. Construct Validity
- refers to whether the test corresponds to its
theoretical construct.
- deduces hypotheses from a theory that is relevant
to the concept
- involves such tests as understanding, appreciation,
and interpretation of data
- examples are mechanical aptitude and intelligence
tests
Example: Suppose a researcher wishes to establish the
validity of an IQ using SCRIT (Safran Culture-Reduced
Intelligence Test). He hypothesis that pupils with high IQ also
have high achievement test and low IQ, low achievement test.
He therefore administers both SCRIT and achievement tests
to two groups of pupils with high and low IQ, respectively. If
the results show that those with high IQ, have high scores in
the achievement test and those with low IQ have low scores in
the achievement test, the test is valid.
5. Convergent Validity

- the validity of a measure ought to be gauged by


comparing it to measures of the same concept
developed through other measures
Example: If we develop a questionnaire measure of
how time managers spend on various activities, we
might examine its validity by tracking a number of
managers and using a structured observation
schedule to record how much time is spent in
various activities and their frequency
RELIABILITY

- refers to the consistency of results.

A reliable instrument yields the same results for


individuals who take the test more than once.
METHODS OF ESTABLISHING RELIABILITY

1. Test-retest Method
The same instrument is administered twice to the
same group of subjects and the correlation
coefficient is determined.
A Spearman rank correlation of coefficient or
Spearman rho is the statistical tool used to
measure the relationship between paired ranks
assigned to individual scores on two variables of
test-retest method.

6 σ 𝐷2
𝑟𝑠 = 1 −
𝑛(𝑛2 − 1)
Respondents Test 1 Test 2 R1 R2 D 𝑫𝟐
1 75 75 4 4 0 0
2 53 55 9 8.5 0.5 0.25
3 47 48 10 10 0 0
4 83 80 2 2 0 0
5 70 75 5.5 4 1.5 2.25
6 69 70 7 6.5 0.5 0.25
7 70 70 5.5 6.5 -1 1
8 55 55 8 8.5 -0.5 0.25
9 77 75 3 4 -1 1
10 85 85 1 1 0 0
5.00
6(5)
𝑟𝑠 = 1 −
10(102 −1)
30
𝑟𝑠 = 1 −
990
𝑟𝑠 = 1 − 0.03 = 0.97
There is a very high correlation between the scores
of the students on the first and second test which
implies that the instrument is highly reliable.
2. Parallel-Forms Method
Parallel or equivalent forms of a test may be
administered to the group of subjects, and the
paired observations correlated. The two forms of
the test must be constructed so that the content,
type of item, difficulty, instructions for
administration, and many others are similar but not
identical.
Example:

Form A: Convert 5 km to meters.


Form B: Convert 5000 meters to km.
3. Split-half Method
The test in this method may be administered
once, but the test items are divided into two halves.
The common procedure is to divide a test into odd
and even items. The two halves of the test must be
similar but not identical in content, number of items,
difficulty, means, and standard deviations.
The reliability coefficient of a whole test is
estimated by using the Spearman-Brown
formula.
2(𝑟ℎ𝑡 )
𝑟𝑤𝑡 =
1 + 𝑟ℎ𝑡

2
6σ𝐷
𝑟ℎ𝑡 =1 −
𝑛(𝑛2 − 1)
Respondents Scores Scores R1 R2 D 𝑫𝟐
(Odd) (Even)
1 55 66 8 7 1 1
2 71 79 3 1 2 4
3 72 70 2 4 -2 4
4 43 50 10.5 9.5 1 1
5 35 31 12 12 0 0
6 64 72 6 3 3 9
7 57 57 7 8 -1 1
8 70 67 4 6 -2 4
9 69 69 5 5 0 0
10 48 50 9 9.5 -0.5 0.25
11 43 41 10.5 11 -0.5 0.25
12 75 75 1 2 -1 1
25.5
6(25.5)
𝑟ℎ𝑡 = 1 −
12(122 − 1)
6(25.5)
=1 − = 0.91
12(144 −1)
2(0.91)
𝑟𝑤𝑡 = = 0.95
1+0.91

Since the reliability of the whole test is very high,


hence, the whole test is reliable.
4. Internal Consistency Method
The test in this method is used with
psychological test which consist of dichotomously
scored items. The examinee either passes (1) or
fails (0) an item. The method of obtaining reliability
coefficient in this method is determined by Kuder-
Richardson Formula:
2(𝑟ℎ𝑡 )
𝑟𝑤𝑡 =
1 + 𝑟ℎ𝑡
𝑁 𝑆𝐷2 − σ 𝑝𝑖 𝑞𝑖
𝑟𝑥𝑥 = ∙
𝑁 −1 𝑆𝐷2

Where: N – number of items

σ 𝑋 2 −𝑛 (𝑚𝑒𝑎𝑛)2
𝑆𝐷 =
2
𝑛 −1
Where: n – number of students
Item STUDENTS fi pi qi piqi
1 2 3 4 5 6 7 8 9 10
1 1 1 1 1 1 1 0 0 1 1 8 0.8 0.2 0.16
2 1 1 1 1 1 0 1 0 1 0 7 0.7 0.3 0.21
3 1 1 1 1 0 1 1 1 1 0 8 0.8 0.2 0.16
4 1 1 0 0 1 1 0 1 0 1 6 0.6 0.4 0.24
5 1 1 0 0 0 0 0 1 0 1 4 0.4 0.6 0.24
6 1 1 0 1 0 1 1 1 0 0 6 0.6 0.4 0.24
7 1 1 1 0 0 1 0 1 0 1 6 0.6 0.4 0.24
7 7 4 4 3 5 3 5 3 4 1.49
𝑋 45
𝑚𝑒𝑎𝑛 = ෍ = = 4.5
𝑛 10

223 −10 (4.5)2


𝑆𝐷 2 = = 2.28
10 −1

7 2.28 −1.49
𝑟𝑥𝑥 = ∙ = 0.40
7 −1 2.28

Since the correlation coefficient is low, hence, the test is not


reliable.

You might also like