You are on page 1of 59

GROUP 6 GROUP 6

ESTABLISHING TEST
VALIDITY AND
RELIABILITY
How do we establish the validity and
reliability of test?

PRESENTED BY GROUP 6
GROUP 6

Willet Cambri Junalyn Degamon Jerson Francisco

Teresa Labanan Hope Sanchez


GROUP 6 GROUP 6

OBJECTIVE;
USE PROCEDURES AND STATISTICAL ANALYSIS TO
ESTABLISH TEST VALIDITY AND RELIABILITY;
DECIDE WHETHER A TEST IS VALID OR RELIABLE;
AND
DECIDE WHICH TEST ITEMS ARE EASY AND
DIFFICULT.
TEST
RELIABILITY
=
CONSISTENCY
TEST RELIABILITY
Measured under 3 conditions:
1. When retested on the same person
2. When retested on the same measure
3. Similarity of responses across items that measure the
same characteristics
TEST RELIABILITY
Three (3) factors that affect the reliability of a measure:
1. The number of items in a test
2. Individual differences of participants
3. External environment
WHAT ARE THE DIFFERENT
WAYS TO ESTABLISH TEST
RELIABILITY?
TEST
RE-TEST
TEST RE-TEST
It is a measure of reliability obtained by administering the same test
twice over a period of time to a group of individuals. The scores
from Time 1 and Time 2 can then be correlated in order to evaluate
the test for stability over time.

Example :
Let's say you're a teacher evaluating a math test for your students. To check its test-retest reliability,
you administer the same math test to your students on a Monday and then again on the following
Monday. If the scores from the first test closely match the scores from the second test, it suggests
that the test has good test-retest reliability, meaning that it measures students' math abilities
consistently from one week to the next. On the other hand, if the scores vary significantly between the
two administrations, it would indicate lower test-retest reliability and raise questions about the
consistency of the test.
PARALLEL
FORMS
PARALLEL FORMS
is a measure of reliability obtained by administering different
versions of an assessment tool (both versions must contain items
that prove the same construct skill, knowledge, base, etc.) to the
same group of individuals.
Example :

ENTRANCE EXAMINATION
LICENSURE EXAMINATION
SPLIT-HALF
SPLIT-HALF
Split-half reliability is determined by dividing the total set of items
relating to a construct of interest into halves (odd-numbered and
even-numbered questions) and comparing the results obtained from
the two subsets of items thus created.
Split - half is applicable when the test has a large number of items.

SPEARMAN RANK CORRELATION COEFFICENT


SPEARMAN BROWN TEST RELIABILITY
SCENARIO
A test that has 160 questions is administered
to fourteen students from different
institutions. The test questions is related to
statistics subjects, to determine the
reliability coefficient of odd and even items
by using Spearman-Brown Formula and
Spearman Rank Correlation Coefficient.
SPEARMAN CORRELATION
COEFFICENT
SPEARMAN BROWN TEST
RELIABILITY
2p
Pxx =
1 +p
AS A RESULT....
The foregoing reliability of half test is
0.80 and reliability of whole test
obtained 0.88 which denotes very good
relationship. This means the reliability
of whole test between odd and even
items of achievement test has a good
reliability.
KUDER-RICHARDSON
FORMULA 20 &
CRONBACH’S ALPHA
KUDER-RICHARDSON
FORMULA 20
The Kuder-Richardson Formula 20, often abbreviated
KR-20, is used to measure the internal consistency
reliability of a test in which each question only has two
answers: right or wrong.
KR-20 = (K / (K-1)) * (1 – ΣPJQJ / Σ2)
where:
k: Total number of questions
pj: Proportion of individuals who answered question j correctly
qj: Proportion of individuals who answered question j
incorrectly
σ2: Variance of scores for all individuals who took the test

The value for KR-20 ranges from 0 to 1, with higher values


indicating higher reliability.
CRONBACH’S ALPHA
Cronbach’s alpha coefficient measures the internal
consistency, or reliability, of a set of survey items.
Cronbach’s alpha quantifies the level of agreement on a
standardized 0 to 1 scale. Higher values indicate higher
agreement between items.
High Cronbach’s alpha values indicate that response values
for each participant across a set of questions are consistent.
And low values indicate the set of items do not reliably
measure the same construct.
CRONBACH’S ALPHA FORMULA
INTER-RATER
RELIABILITY
Inter-rater reliability
- This procedure is used to determine the consistency of
multiple raters when using rating scales and rubrics to judge
performance.
- Inter-rater is applicable when the assessment requires the use
of multiple raters.

EXAMPLE: Watching any sport using judges.


Kendall's Tau coefficient of
concordance
- It's used to determine if the ratings provided by
multiple raters agree with each other.

Measures the strength of the


relationship between two ordinal level
variables.
LINEAR REGRESSION
The simplest mathematical relationship between two variables X
and Y is a linear relationship.
In a cause and effect relationship, the independent variable is the
cause and the dependent variables is the effect.

Linear regression is demonstrated when you have a two variables


that are measured, such as two set of scores in a test taken at two
different times by the same participants.
Score 2 = 4.8493 + 1.0403”x
Pearson correlation coefficient (r)
The Pearson correlation
coefficient (r) is the most
common way of
measuring a linear
correlation. It is a number
between –1 and 1 that
measures the strength
and direction of the
relationship between two
variables.
Arowwai
Industries

DIFFERENCE BETWEEN A POSITIVE


AND A NEGATIVE CORRELATION
POSITIVE CORRELATION
When the value of the correlation coefficient is positive, it
means that the higher the scores in X, the higher the score
in Y.
NEGATIVE CORRELATION
When the value of the correlation is negative, it means that
the higher the scores in X, the lower the scores in Y.
Arowwai
Industries

DETERMINING THE SIGNIFICANCE


OF THE CORRELATION
In order to determine if a correlation coefficient value is
significant, it is compared with an expected probability of
correlation coefficient values called a critical value. When the
value computed is greater than the critical value, it means that
the information obtained has more than 95% chance of being
correlated and is significant.
Arowwai
Industries

DETERMINING THE STRENGTH OF


A CORRELATION
CRONBACH’S A
KENDALL’S W FORMULA
A Kendall’s W coefficient value of
0.38 indicates the agreement of
the three raters in the five
demonstrations. There is fair
agreement among the three
raters because the value is far
from 1.00
GROUP 6 GROUP 6

TEST
VALIDITY
Test validity refers to the degree to which the test actually
measures what it claims to measure. Test validity is also the
extent to which inferences, conclusions, and decisions made
on the basis of test scores are appropriate and meaningful.
PRESENDTED BY GROUP 6
Content Validity
When the items represent the domain being measured
The items are compared with the objectives of the program. The items
need to measure directly the objectives (for achievement) or definition
(for scales). A reviewer conducts the checking.

Face Validity
When the test is presented well, free of errors, and administered well
The test items and layout are reviewed and tried out on small group of
respondents. A manual for administration can be made as a guide for
the test administrator.
Predictive Validity
Predictive validity refers to the ability of a test or other measurement to predict
a future outcome. Here, an outcome can be a behavior, performance, or even
disease that occurs at some point in the future
A measure should predict a future criterion.

Example is an entrance exam predicting the grades of the students after the first
semester.

Construct Validity
The components or factors of the test should contain items that are strong
correlated.
Construct validity concerns the extent to which your test or measure accurately
assesses what it's supposed to. In research, it's important to operationalize
constructs into concrete and measurable characteristics based on your idea of
the construct and its dimensions.
Convergent Validity
shows how much a measure of one construct aligns with
other measures of the same or related constructs.

Concurrent Validity
about how a measure matches up to some known
criterion or gold standard, which can be another
measure.
DIVERGENT VALIDITY
indicates that the results obtained by this instrument do not
correlate too strongly with measurements of a similar but
distinct trait.
the term “divergent validity” is sometimes used as a synonym for
discriminant validity and has even been used by some well-
known writers in the measurement field although it is not the
commonly accepted term.
GROUP 6 GROUP 6

HOW TO
DETERMINE IF AN
ITEM IS EASY OR
DIFFICULT?For items with one correct alternative worth a single point, the item
difficulty is simply the percentage of students who answer an item
correctly. In this case, it is also equal to the item mean. The item difficulty
index ranges from 0 to 100; the higher the value, the easier the question.

PRESENDTED BY GROUP 6
CASE STUDY CASE STUDY

“PLAGIARISM IS NOT A 'LANG':


ACADEMIC DISHONESTY AND
ITS IMPACT ON
ASSESSMENTS’ EFFECTIVITY”
CASE STUDY CASE STUDY
CASE STUDY CASE STUDY
CASE STUDY CASE STUDY
CASE STUDY CASE STUDY
CASE STUDY CASE STUDY
CASE STUDY CASE STUDY
CASE STUDY CASE STUDY
CASE STUDY CASE STUDY
CASE STUDY CASE STUDY
CASE STUDY CASE STUDY
CASE STUDY CASE STUDY
CASE STUDY CASE STUDY
CASE STUDY CASE STUDY
CASE STUDY CASE STUDY
CASE STUDY CASE STUDY
GROUP 6 GROUP 6

THANK
YOU
GROUP 6

You might also like