You are on page 1of 30

VALIDITY

and
RELIABILITY
ELEYNFIE A. SANICO, MAEd-EEd
EEd 505- Evaluation of Learning
VALIDITY:
It is a term derived from the Latin word validus, meaning strong.

In contrast to what some teachers believe, it is not a property of a test. It pertains


to the accuracy of the inferences of the teachers make about students based on the
information gathered from an assessment (McMillan, 2007; Fives & DiDonato-
Barnes, 2013)
This implies that the conclusions teachers come up
with in their evaluation of student performance is
valid if there are strong and sound evidences of the
extent of students’ learning.
An assessment is valid if it measures a students’ actual
knowledge and performance with respect to the intended
outcomes, and not something else.
It is represented of the area of learning or content of the
curricular aim being assessed (McMillan, 2007;Popham,
2011).
For instance, an assessment purportedly for measuring
arithmetic skills of grade 4 pupils is invalid if used for grade 1
pupils because of issues on content (test content evidence)
and level of performance (response process evidence).
A test that measures recall of
mathematical formula is invalid if it
is supposed to assess problem-
solving.
There are three sources of information that
can be used to establish validity:

Content-Related Evidence
Criterion-Related Evidence
Construct-Related Evidence
A. Content-Related Evidence

Content-related evidence for


validity pertains to the extent to
which the test covers the entire
domain of content.
For example, If a grade 4 pupil was able
to correctly answer 80% of the items in
Science test about matter, the teacher may
infer that the pupil knows 80% of the
content area.
Face validity

A test that appears to adequately


measure the learning outcomes and
content is said to possess face validity.
As the name suggests, it looks at the superficial
face value of the instrument.
It is based on the subjective opinion of the one
reviewing it.
Hence, it is considered non-systematic or non-
scientific.
A test that was prepared to assess the ability of
pupils to construct simple sentences with correct
subject-verb agreement has face validity if the test
looks like an adequate measure of the cognitive
skills.
Instructional Validity
The extent to which an assessment is systematically
sensitive to the nature of instruction offered.
Popham (2006,p.1) defined as the “degree to which
students’ performances on a test accurately reflect the
quality of instruction to promote students’ mastery of
what is being assessed.”
Yoon & Resnick (1998) asserted that an instructionally valid
test is one that registers differences in the amount and kind of
instruction to which students have been exposed.

They also described the degree of overlap between the content


tested and the content taught as opportunity to learn which has
an impact on test scores.
In the first grading, they will cover three economic issues:

Unemployment
Globalization
Sustainable development
Only two were discussed in class but assessment covered all three
issues. Although these were all identified in the curriculum guide and
may even be found in a textbook, the question remains as to whether
the topics were all taught or not.
Inclusion of items that were not taken up in class reduces validity
because students had no opportunity to learn the knowledge or skill
being assessed.
Table of Specifications (ToS)
 It is prepared before developing the test. It is the best blueprint that identifies the content
area and describes the learning outcomes at each level of the cognitive domain (Notar, et.al.,
2004)
 It is a tool used in conjunction with lesson and unit planning to help teachers make genuine
connections between planning, instruction, and assessment (Fives and DiDonato-Barnes,
2013)
 It assures teachers that they are testing students’ learning across a wide range of content and
readings as well as cognitive processes requiring higher order thinking.
Carey ( as cited by Notar, et. Al., 2004) specified six elements
in ToS development:

1. balance among the goals selected for the examination


2. balance among the levels of learning
3. the test format
4. the total items
5. the number of items for each goal and level of learning
6. the enabling skills to be selected from each goal framework
 Meanwhile, determining the number of items for each topic in the ToS depends
on the instructional time. This means that the topics that consumed longer
instructional time with more teaching-learning activities should be given more
emphasis.
 Test items that demand higher order thinking skills obviously require more time
to answer, whereas simple recall items entail the least amount.
 Nitko & Brookhart (2011) gives the average response time for each assessment
task, as seen in Table below.
Table 4.1: Time Requirement for Certain
Assessment Tasks
Type of Test Questions Time Required to Answer
Modified Response (True-false) 20-30 seconds
Modified True or false 30-45 seconds(Notar, et.al., 2004)
Sentence completion (one-word fill-in) 40-60 seconds
Multiple choice with four responses (lower level) 40-60 seconds
Multiple choice (higher level) 70- 90 seconds
Matching type (5 stems, 6 choices) 2-4 minutes
Short answer 2-4 minutes
Multiple choice (with calculation) 2-5 minutes
Word problems (simple arithmetic) 5-10 minutes
Short essays 15-20 minutes
Data analysis/graphing 15-25 minutes
Drawing models/labeling 20-30 minutes
Extended essays 35-50 minutes
Validity suffers if the test is too short to sufficiently measure
behavior and cover the content.
Adding more items to the test may increase its validity.
However, an excessively long tests that may be taxing to the
students.
Regardless of the trade-off, teachers must construct tests that
students can finish within a reasonable time.
B. Criterion-Related Evidence

It refers to the degree to which, test scores agree


with an external criterion. As such, it is related to
external validity. It examines the relationship
between an assessment and another measure of the
same trait (McMillan, 2007).
Three types of criteria (Nitko & Brookhart, 2011)

1. Achievement test scores


2. Ratings, grades and other numerical
judgments made by the teacher
3. Career data
Types of Criterion-Related Evidence

1.Concurrent Validity- provides an estimate of a


student’s current performance in relation to a
previously validated or established measure.
2.Predictive Validity- pertains to the power or usefulness
of test scores to predict future performance
C. Construct- Related Evidence
A construct is an individual characteristic that explains some aspect
of behavior (Miller, Linn & Gronlund, 2009).
Construct- related evidence of validity is an assessment of the quality
of the instrument used.
It measures the extent to which the assessment is a meaningful
measure of an unobservable trait or characteristic ?(McMillan,
2007).
Types of Construct-Related Evidence
(McMillan, 2007)

1.Theoritical
2.Logical
3.Statistical
Unified Concept of Validity
In 1989, Messick proposed a unified concept of validity based on an expanded
theory of construct validity which addresses score meaning and social values in
test interpretation and test use.

His concept of unified validity “ integrates considerations of content, criteria, and


consequences into a construct framework for the empirical test9ing of rational
hypotheses about score meaning and theoretically relevant relationship”
(Messick, 1995,p.741)
Validity of Assessment Methods
Moskall (2003) Laid down 5 Recommendations. These are intrinsically associated to the validity of the
assessment.

1. The selected performance should reflect a valued activity.


2. The completion of performance assessment should provide a valuable
learning experience.
3. The statement of goals and objectives should be clearly aligned with the
measurable outcomes of the performance activity.
4. The task should not examine extraneous or unintended variables.
5. Performance assessments should be fair and free from bias.
Threats to Validity
Miller, Linn & Gronlund (2009) identified ten factors that affect validity of assessment results.
1. Unclear test directions
2. Complicated vocabulary and sentence structure
3. Ambiguous statement
4. Inadequate time limits
5. Inappropriate level of difficulty of test items
6. Poorly constructed test items
7. Inappropriate test items for outcomes being measured
8. Short test
9. Improper arrangement of items
10.Identifiable pattern of answers
Thank you
and
God bless..

You might also like