Establishing Test Reliability. Presentation

LESSON 6
Establishing
Test Validity
and Reliability
Test Reliability
2
What is Test
Reliability?
Reliability is the consistency of the responses to

measure under three consideration:
(1) when retested on the same person;
(2) when retested on the same measure; and
(3) similarity of responses across items that measure the
same characteristics.
3
Factors affecting the
reliability of a
measure
The reliability of a measure can be high or low depending on
the following factors:
1. The number of items in a test - the more items in a test has,

the likelihood of reliability is high. The probability of
obtaining consistent scores is high because of the large pool
of items.
4
reliability of a
measure
2. Individual differences of participants - Every participant
possesses characteristics that affect their performance in a
test, such as fatigue, concentration, innate ability,
perseverance, and motivation. These individual factors
change over time and affect the consistency of the answers
in the test.
5
reliability of a
measure
3. External environment - The external environment may
include room temperature, noise level, depth of instruction,
exposure to materials, and quality of instruction, which
could affect changes in responses of examinees in a test.
6
What are the different ways to establish test
reliability?
The specific kind of reliability

will depend on the (1) variable you
are measuring, (2) type of test, and
(3) number of versions of the test.
7
Method in Testing
Reliability
1. Test - retest
2. Parallel forms
3. Split - Half
4. Test of internal Consistency Using Kuder - Richardson
and Cronbach’s Alpha Method
5. Inter - rater reliability
8
1 Test-retest
Test-retest reliability measures the consistency of results

when you repeat the same test on the same sample at a
different point in time. You use it when you are measuring
something that you expect to stay constant in your sample.
Method in How is this What statistics
Testing Reliability reliability done? is done?
1. Test - retest You have a test, and you need to Correlate the test scores from
administer it at one time to a group of the first and the next
examinees. Administer it again at administration. Significant and
another time to the “same group” of positive correlation indicates
examinees. that the test has temporal
stability overtime.
There is a time interval of not more
than 6 months between the first and Correlation refers to a
second administration of a tests that statistical procedure where
measure stable characteristics, such as linear relationship is expected
standardized aptitude tests. for two variables.
The post - test can be given a minimum You may use:

time interval of 30 minutes. The - Pearson Product
responses in test should be more or Moment Correlation
less the same across the two points in - Pearson r
time.
10
2 Parallel Forms
Parallel forms reliability measures the correlation between

two equivalent versions of a test. You use it when you have
two different assessment tools or sets of questions designed
to measure the same thing.
2. Parallel Forms There are two versions of a test. Correlate the test results for
The items need to exactly measure the first form and the
the same skill. second form.
Each test version is called a “form”. Significant and positive

Administer one form at one time correlation coefficient are
and the other form to another time expected. The significant
to the “same” group of participants. and positive correlation
The responses on the two forms indicates that the responses
should be more or less the same. in the two forms are the
same or consistent.
Parallel forms are applicable if there
are two versions of the test. Pearson r is usually used
for this analysis.
12
3 Split-Half
Split-half reliability is a measure of consistency whereby a

set of items that make up a measure is split in two during the
data analysis stage to compare the scores for each half of the
measure with one another.
3. Split-Half Administer a test to a group of Correlate the two sets of
examinees. The items need to be split scores using Pearson r.
into halves, usually using the odd-
even technique. After the correlation, use
another formula called
In this technique, get the sum of the Spearman-Brown
points in the odd-numbered items and Coefficient. The correlation
correlate it with the sum of points of coefficient obtained using
the even-numbered items. Each Pearson r and Spearman
examinee will have two scores Brown should be significant
coming from the same test. The and positive to mean that the
scores on each set should be close or test has internal consistency
consistent. reliability.
Split-half is applicable when the test

has a large number of items.
14
Test of Internal Consistency
3 Using Kuder-Richardson and
Cronbach’s Alpha Method
The Kuder-Richardson Formula 20, often abbreviated

KR-20, is used to measure the internal consistency
reliability of a test in which each question only has two
answers: right or wrong.
4. Test of Internal This procedure involves A statistical analysis called
Consistency Using determining if the scores for each Cronbach’s alpha or the
Kuder-Richardson and item are consistently answered Kuder Richardson is used
Cronbach’s Alpha by the examinees. to determine the internal
Method consistency of the items.
After administering the test group
of examinees, it is necessary to A Cronbach’s alpha value of
determine and record the scores 0.60 and above indicates
for each time. The idea here is to that the test items have
see if the responses per item are internal consistency.
consistent with each other.
This technique will work well

when the assessment tool has a
large number of items. It is also
applicable for scales and
inventories, e.g. Likert scale.
16
Test of Internal Consistency
3 Using Kuder-Richardson and
Cronbach’s Alpha Method
Inter-rater reliability is a way to measure the level of

agreement between multiple raters or judges.
5. Inter- rater This procedure is used to A statistical analysis called
Reliability determine the consistency of Kendall’s tau coefficient
multiple raters when using rating of concordance is used
scales and rubrics to judge to determine if the ratings
performance. provided by multiple raters
agree with each other.
The reliability here refers to the
similar or consistent ratings Significant Kendall’s tau
provided by more than one rater value indicates that the
or judge when they use an raters concur or agree
assessment tool. with each other in their
ratings.
Inter-rater is applicable when the
assessment requires the use of
multiple raters.
18
Statistical Analysis
1. Linear regression
2. Computation of Pearson r
3. Difference between positive and a negative correlation
4. Determining the strength of a correlation
5. Determining the significance of the correlation
19
Linear
1 regression
Linear regression is demonstrated when you have two

variables that are measured, such as two set of scores in a
test taken at two different times by the same participants.
Computation of
2 Pearson r
correlation
• The index of the linear regression is called a correlation coefficient.
• When the points in a scatterplot tend to fall within the linear line, the
correlation is said to be strong.
• When the direction of the scatterplot is directly proportional, the correlation
coefficient will have a positive value.
• If the line is inverse, the correlation coefficient will have a negative value.
The statistical analysis used to determine the correlation coefficient is called

the Pearson r.
Difference between a
3 positive and negative
correlation
Positive correlation. When the value of correlation coefficient

is positive. The higher the scores in X, the higher the scores in
Y.
Negative Correlation. When the value of the correlation

coefficient is negative. The higher the scores in X, the lower
the scores in Y, and vice versa.
22
Determine the
4 strength of a
correlation
The strength of a correlation also indicates the strength of the reliability of the
test. This is indicated by the value of the correlation coefficient. The closer the
value to 1.00 or -1.00, the stronger is the correlation.
0.80 – 1.00 Very Strong relationship

0. 60 – 0.79 Strong relationship
0.40 – 0. 59 Substantial/marked relationship
0.20 – 0.39 Weak relationship
0.00 – 0.19 Negligible relationship
Determining the
5 significance of a
correlation
• In order the correlation to make sure it’s free from error, it is tested for
significance.
• When a correlation is significant, it means that the probability of the two
variables being related is free of certain errors.
• To determine that a correlation coefficient is significant, it is compared with
an expected probability of correlation coefficient values called a critical
value. When the value computed value is greater than the critical value,
it ,means that the information obtained has more than 95% chance of being
correlated and is significant.
24
Test Validity
25
What is Test
Validity?
A measure is valid when it measures what is supposed to

measure.
If a quarterly exam is valid, then the contents should directly
measure the objectives of the curriculum. If a scale of that measures
personality is composed of fives factors, then the scores of the five
factors should have items that are highly correlated. If an entrance
exam is valid, it should predict student’s grades after the first
semester.
Different ways to
establish test
validity
Type of Definition Procedure
Validity
Content When the items The items are compared with
Validity represent the domain the objectives of the program.
being measured The items need to measure
directly the objectives (for
achievement) or definition
(scales). A reviewer conducts
the checking.
27
Validity
Face When the test is The test items and layout are
Validity presented well, free reviewed and tried out on a
of errors, and small group of respondents. A
administred well. manual for administration can
be made as a guide for the
test administrator.
28
Validity
Predictive A measure should a correlation coefficient is

Validity also predict a obtained where teh X - variable
future criterion. is used as the predictor and the
Example is an Y - variable as the criterion.
entrance exam
predicting the
grades of the
students after the
first semester.
29
Validity
Construct The components or The Pearson r can be used to

Validity factors of the test correlate the items for each
should contain factor. However, there is a
items that are technique called factor
strongly correlated. analysis to determine which
items are highly correlated to
form a factor.
30
Validity
Concurrent The components or The scores on the

Validity factors of the test measure should be
should contain items correlated.
that are strongly
correlated.When two
or more measures are
present for each
examinee that
measure the same
characteristic
31
Validity
Convergent When components or Correlation is done for the

Validity factors of a test are factors of the test.
hypothesized to have
a positive correlation.
32
Validity
Divergent When components or Correlation is done for the

Validity factors of a test are factors of the test.
hypothesized to have
a negative correlation.
An example to
correlate are the
scores in a test on
intrinsic and extrinsic
motivation.
33
How to determine if an item is easy
or difficult?
An item is difficult if An item is easy if the
majority of students are majority of the students
unable to provide the correct are able to answer
answer. correctly.
An item can discriminate if the examinees who score high in the test
can answer more the items correctly than examinees who got low
scores.
34
Free templates for all your presentation needs
For PowerPoint and 100% free for personal or Ready to use, professional Blow your audience away
Google Slides commercial use and customizable with attractive visuals

Establishing Test Reliability. Presentation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Establishing Test Reliability. Presentation

Uploaded by

Copyright:

Available Formats

LESSON 6

Reliability is the consistency of the responses to

1. The number of items in a test - the more items in a test has,

The specific kind of reliability

Test-retest reliability measures the consistency of results

The post - test can be given a minimum You may use:

Parallel forms reliability measures the correlation between

Each test version is called a “form”. Significant and positive

Split-half reliability is a measure of consistency whereby a

Split-half is applicable when the test

The Kuder-Richardson Formula 20, often abbreviated

This technique will work well

Inter-rater reliability is a way to measure the level of

Linear regression is demonstrated when you have two

The statistical analysis used to determine the correlation coefficient is called

Positive correlation. When the value of correlation coefficient

Negative Correlation. When the value of the correlation

0.80 – 1.00 Very Strong relationship

A measure is valid when it measures what is supposed to

Predictive A measure should a correlation coefficient is

Construct The components or The Pearson r can be used to

Concurrent The components or The scores on the

Convergent When components or Correlation is done for the

Divergent When components or Correlation is done for the

You might also like