You are on page 1of 26

SUBMITTED BY:

FUENTEBLANCA GYKA J.
SUBMITTED TO:
LEBECO JOANNE ENG 106.1 SIR UBENIA
real lie yah bee lit tea

RELIABILITY
RELIABILITY
OF
SOURCE OF ASSESSMENT
RELIABILITY METHODS

TYPES OF
RELIABILITY
RELIABILIT
Y
RELIABILITY
- it talks about reproducibility and consistency in
methods and criteria.

- Reliability is expressed as a correlation coefficient.

- An Assessment is said to be reliable if it produces


the same results if given to an examinee on two
occasions.
TYPES OF
RELIABILITY
• Internal Reliability
• External Reliability
• Internal Reliability
- it assesses the consistency of results
across items within a test.

Are the results consistent within a test? If


we split the test in half, do both halves show
the same thing?
• External
Reliability
- it gauges the extent to which a
measure varies from one use to
another.
Do we get the same results time
after time using the same test?
SOURCES OF
RELIABILITY
EVIDENCE
A. Stability
B. Equivalence
C. Internal Consistency
D. Scorer or Rater Consistency
E. Decision Consistency
A. Stability
- The Test-retest reliability correlates scores
obtained from two administrations of the same test
over a period of time.

B. Equivalence
- Parallel forms of reliability ascertain the
equivalency of forms.
- Two different versions of an assessment tool
are administered to the same group of
individuals.
C. Internal Consistency
- It implies that a student who has mastery
learning will get all or most of the items correctly
while a student who knows little or nothing about
the subject matter will get all or most of the items
wrong.
Split-half method is done by dividing the test into two-
separating the half and the second half of the test by odd
and even numbers and then correlating the results of two
halves. The Spearman-Brown formula is applied.

Split-half is effective for large questionnaires or


test with several items measuring the same
construct.
There are other ways to
establish Internal
Consistency: • The Kuder Richardson
• Cronbach-alpha is a better 20/21 formula is
measure than Split-half because applicable to dichotomous
it gives the average of all the items (0/1).
split-half reliabilities. - Items of the test are
- It measures how well items in scored 1 if marked
a scale e.i.
correctly, otherwise zero.
(1- strongly disagree to 5-
strongly agree) correlate with
one another (Salvucci, et al,
1997).
For Internal Consistency, the range of reliability
measures are rated as follows:

Less than 0.50- the reliability is low;


Between 0.50 and 0.80- reliability is moderate and
Greater than 0.80- the reliability is high
(Salvucci, et al., 1997)
D. Scorer or Rater consistency

People do not necessarily rate in a similar way. They may have disagreements as to how responses or materials truly
reflect demonstrate knowledge of the construct or skill being assessed.

BIAS - is partiality or a display of prejudice in favor or against a student or group.

Halo - is a cognitive of bias, allowing first impressions to color one’s judgment of another person’s specific
traits.
effect - can be traced back in 1920 to Edward Thorndike’s study entitled, ‘ A Constant Error in Psychological
Ratings”.

- is the degree to which different raters, observers or judges agree in their assessment decisions.
Inter-rater- - There should be a good variation of products to be judge, scoring criteria are clear and raters are
reliability knowledgeable or trained on how to use the observation instrument(McMillian,2007).
Spearman’s rho or Cohen’s kappa
E. DECISION CONSISTENCY
Describes how consistent the classification decisions are rather than how consistent the
scores are
(Nitko & Brookhart, 2011).

Levels of proficiency adapted in the Philippines at the onset of K to 12


program.
Beginning (74% and below)
Developing (75% - 79%)
Approaching Proficiency (80% - 84%)
Proficient (85% - 89%)
Advanced (90% and above)
MEASUREMENT ERRORS
• Measurement errors can be caused by
examinee-specific factors like fatigue,
boredom,
lack of motivation, momentary lapses of
memory and carelessness.
• Test specific factor are also causes of
measurement errors.
• Ambiguous questions would only elicit
vague and varied responses.
• Classical test theory gives formula

X=T+E
X - observation
T - true value
E- measurement error.
RANDOM ERRORS
- Random errors are those that affect
reliability while systematic errors impact
validity.

- Random errors called noise, produce


random fluctuations in measurement scores.

STANDARD ERROR OF MEASUREMENT


(SEM)
- is an index of the expected variation of the
observed scores due to measurement error.
- SEM pertains to the standard deviation of
measurement errors associated with test
scores.

the mean, standard deviation ( S ) and test


x
score reliability ( r ) are obtained.
xx

SEM Formula:
_____
SEM = S √ 1- r
xx.

A student true score may then be estimated


by relating SEM to the normal curve.
The observed score (X) regarded as the mean.

• Confidence Interval or score band, where the score lies.

The score band X ± SEM gives reasonable limit for estimating the true
score. Standard deviation of 6.33 and Cronback alpha reliability estimate
0.90 The calculated SEM is 2. Suppose a student receives a score of 32,
how do we make an interpretation about his/her deviation away from
32, i.e , between 30 and 34; 95% confident that his/her true score falls
between
28 and 36 ( X ± 2SEM); and 99% that the true score is in the 26
and 38 range (X ± 3SEM).
Systematic errors are referred to lengthening or increasing the
as bias. number of items in a teasy can
increase reliability.
Random errors can be reduced by
getting the average multiple
measurements, systematic errors
can be reduced by identifying and
removing the errors at the source.

Reliability indicates the extent to


which scores are free from
measurements errors.
Reliability of Assessment Methods Direct observation data
(Hintze, 2005) can be enhanced through
(Miller, Linn & Gronlunf, 2009; Harries inter-observer agreement and intra-
1997). Performance assessment is said to observer reliability.
have a low reliability because of
judgemental scoring. Self-assessment
(Ross, 2006), have high consistency across
(Harris, 1997) There is a limited sampling items and over short period of time.
of course content.
Below are ways to improve reliability of
(Jonsson & Svingby, 2007) Reliable of assessment results (Nitko & Brookhart,
scoring performance assessments can be 2011)
enhanced by the use of analytic and
-specific rubrics complemented with • Lengthen the assessment procedure by
exemplars and/or rater training. providing more time , more questions
and more whenever practical.
• Broaden the scope of the procedure
by assessing all the significant aspects
of the largest learning performance.
• Improve objectivity by using a
systematic and more formal
procedure for scoring students
performance. A scoring scheme or
rubric would prove useful.
• Use multiple markers by employing
inter-rater reliability.
• Combine results from several
assessments especially when making
crucial educational decisions.
• Provide sufficient time to students in
completing the assessment
procedure.
• Teach students how to perform their
best by providing practice and training
to students and motivating them.
• Match the assessment difficulty to the
students ability levels by providing
tasks that are neither too easy nor too
difficult, and tailoring the assessment
to each students ability level when
possible.
• Differentiate among students by
selecting assessment task that
distinguish or discriminate the best
from the least able students.
THANK
YOU!

You might also like