Professional Documents
Culture Documents
PROFED8
PROFED8
ESTABLISHING TEST
VALIDITY AND
RELIABILITY
How do we establish the validity and
reliability of test?
PRESENTED BY GROUP 6
GROUP 6
OBJECTIVE;
USE PROCEDURES AND STATISTICAL ANALYSIS TO
ESTABLISH TEST VALIDITY AND RELIABILITY;
DECIDE WHETHER A TEST IS VALID OR RELIABLE;
AND
DECIDE WHICH TEST ITEMS ARE EASY AND
DIFFICULT.
TEST
RELIABILITY
=
CONSISTENCY
TEST RELIABILITY
Measured under 3 conditions:
1. When retested on the same person
2. When retested on the same measure
3. Similarity of responses across items that measure the
same characteristics
TEST RELIABILITY
Three (3) factors that affect the reliability of a measure:
1. The number of items in a test
2. Individual differences of participants
3. External environment
WHAT ARE THE DIFFERENT
WAYS TO ESTABLISH TEST
RELIABILITY?
TEST
RE-TEST
TEST RE-TEST
It is a measure of reliability obtained by administering the same test
twice over a period of time to a group of individuals. The scores
from Time 1 and Time 2 can then be correlated in order to evaluate
the test for stability over time.
Example :
Let's say you're a teacher evaluating a math test for your students. To check its test-retest reliability,
you administer the same math test to your students on a Monday and then again on the following
Monday. If the scores from the first test closely match the scores from the second test, it suggests
that the test has good test-retest reliability, meaning that it measures students' math abilities
consistently from one week to the next. On the other hand, if the scores vary significantly between the
two administrations, it would indicate lower test-retest reliability and raise questions about the
consistency of the test.
PARALLEL
FORMS
PARALLEL FORMS
is a measure of reliability obtained by administering different
versions of an assessment tool (both versions must contain items
that prove the same construct skill, knowledge, base, etc.) to the
same group of individuals.
Example :
ENTRANCE EXAMINATION
LICENSURE EXAMINATION
SPLIT-HALF
SPLIT-HALF
Split-half reliability is determined by dividing the total set of items
relating to a construct of interest into halves (odd-numbered and
even-numbered questions) and comparing the results obtained from
the two subsets of items thus created.
Split - half is applicable when the test has a large number of items.
TEST
VALIDITY
Test validity refers to the degree to which the test actually
measures what it claims to measure. Test validity is also the
extent to which inferences, conclusions, and decisions made
on the basis of test scores are appropriate and meaningful.
PRESENDTED BY GROUP 6
Content Validity
When the items represent the domain being measured
The items are compared with the objectives of the program. The items
need to measure directly the objectives (for achievement) or definition
(for scales). A reviewer conducts the checking.
Face Validity
When the test is presented well, free of errors, and administered well
The test items and layout are reviewed and tried out on small group of
respondents. A manual for administration can be made as a guide for
the test administrator.
Predictive Validity
Predictive validity refers to the ability of a test or other measurement to predict
a future outcome. Here, an outcome can be a behavior, performance, or even
disease that occurs at some point in the future
A measure should predict a future criterion.
Example is an entrance exam predicting the grades of the students after the first
semester.
Construct Validity
The components or factors of the test should contain items that are strong
correlated.
Construct validity concerns the extent to which your test or measure accurately
assesses what it's supposed to. In research, it's important to operationalize
constructs into concrete and measurable characteristics based on your idea of
the construct and its dimensions.
Convergent Validity
shows how much a measure of one construct aligns with
other measures of the same or related constructs.
Concurrent Validity
about how a measure matches up to some known
criterion or gold standard, which can be another
measure.
DIVERGENT VALIDITY
indicates that the results obtained by this instrument do not
correlate too strongly with measurements of a similar but
distinct trait.
the term “divergent validity” is sometimes used as a synonym for
discriminant validity and has even been used by some well-
known writers in the measurement field although it is not the
commonly accepted term.
GROUP 6 GROUP 6
HOW TO
DETERMINE IF AN
ITEM IS EASY OR
DIFFICULT?For items with one correct alternative worth a single point, the item
difficulty is simply the percentage of students who answer an item
correctly. In this case, it is also equal to the item mean. The item difficulty
index ranges from 0 to 100; the higher the value, the easier the question.
PRESENDTED BY GROUP 6
CASE STUDY CASE STUDY
THANK
YOU
GROUP 6