You are on page 1of 6

PSYCHOLOGICAL ASSESSMENT (PREFINALS)

THE SOURCES OF MEASUREMENT ERROR

RELIABILITY
There can be several
errors with the process of
test construction, it
Reliability is the degree to which an assessment includes the test items
tool produces stable and consistent results. In and the item sampling
other words, a psychological test or assessment itself.
would only be considered valuable if it can get the
same results upon each administration. In other words, the items
included in a test and how
Example: A ruler would give the same results every they were selected could
single time we would use it to measure the length produce measurement
of an object which is why it is considered as errors in a test when it
reliable. Test Construction comes to its
construction. It is
A reliability coefficient is an index of reliability, a possible that the items
proportion that indicates the ratio between the true included In a test does not
scored variance on a test and the total variance. In really measure the
other words, reliability coefficient refers to the desired construct, it
numerical index or the value of the reliability of a could also because the
test, meaning to say it expresses whether or not a test maker was not able to
test or an assessment is reliable. select the questions
properly.
Usually, reliability coefficient ranges from 0 to 1. A
score of 0 to 0.60 means that the test is not reliable
while a grade from 0.70 to 1 indicates that it is in
fact reliable. In other words, unlike the classical When it comes to the test
test theory, reliability coefficient is numerical in administration factors
such as environment, test
nature.
taker, and test
administrator (examiner
related variables such as
THE CONCEPT OF RELIABILITY Test Administration appearance and
treatment) could become
The Classical Test Theory is a score on a test that is the reason for the
presumed to reflect not only the true score of the existence of errors in a
test taker on the ability being measured but also test or assessment.
the error. The said theory claims that the observed
score (X) is formulated through the combination of
the true score (T) and error (E). It means that what As for the errors in the test
we get from a test is only the observed score since scoring and interpretation
true score can only be acquired if there is an there are actually two
absence of error which is not possible in the real Test Scoring and types of them, test and
world. However, classical test theory cannot be Interpretation pencil test errors and
numerically translated. digital errors.

CLASSICAL TEST THEORY FORMULA When it comes to digital


errors, the most common
X (observed score) = T (true score) + E (error) factors that contribute to
its existence in a test
include the internet TYPES OF RELIABILITY
connection, the device
that is being used, etc.

On the other hand, in a


test and pencil setting the
primary issue is what is It is a test of reliability
called subjective scoring obtained by
also known as bias correlating pairs of
grading which is scores from the same
commonly present during people on two
the scoring of projective different
tests. administrations of the
same test. The test-
retest reliability could
be considered high if
INTERPRETING TEST RESULTS the score from the
initial and final testing
A reliability coefficient can range from a value of 0
correlates with each
to a value of 1. The reliability of a test needs to be
other.
around a score of 0.70 (70%) and above in order to
be considered as reliable (high reliability). A high However, when it
reliability indicates that the test is measuring comes to test-retest
something (its desired variable) while a low reliability have to
reliability means that the measurements made consider the construct
through the use of the said test are only due to or the variable being
coincidence. Furthermore, a score of flat 1 Test-Retest measured. If it is
indicates that the test does not have any presence Reliability consistently changing
of error (which is impossible). such as states, test-
retest reliability would
Additionally, if the reliability coefficient is lower be useless.
than .70 it means that the majority of the test
variance is error which means that most of the test Test-retest reliability
scores are only out of chances. often deals with test
administration errors
A high reliability is a requirement for high validity, since the test needs to
meaning to say a test cannot be valid without being be administered twice
reliable, however, it can be reliable without being which makes it prone
valid. In simpler terms, in order for test to be to errors during the
considered valid it needs to be capable of time period between
measuring a variable accurately with the same the initial and final
testing.
level of precision while on the other hand, a test
can have a high level of precision but low level of
The statistical tools
accuracy. that can be used in this
type of reliability are
either Pearson R or
Spearman Rho.
It refers to the
measure of reliability
obtained by It is the consistency of
administering results among two or
different versions of an more different forms
assessment tool (both of a test.
versions must contain
items that probe the Parallel Form and
same construct, skill, Alternative-Form
knowledge, base, etc) Reliability are
to the same group of somewhat similar with
individuals. The scores each other since both
from the two versions of them is in need of
can then be correlated having large number of
in order to evaluate the test questions that
consistency of results measure the same
across alternate construct and then
versions. they would be split
into two separate
In simpler terms, in tests. However, in
this type of reliability parallel form the mean
the test taker would of the test scores of
need to take different both tests are
tests which measure identical while on the
the same construct. Alternative-Form other hand, in
Parallel Forms Reliability alternative-form, the
Reliability Usually, test makers mean of the two tests
create a vast number is not that similar with
of test questions each other. The
which they would then statistical tool used for
split into two in order the both of them are
to have two different either Spearman Rho
tests which measure or Pearson R.
same construct.
In Alternative-Form
The Parallel Forms Reliability, the errors
Reliability deals with are mainly due to the
errors from test test administration
administration due to because of the time
the time interval interval and test
between the testing construction since the
and the test more test questions
construction since it is are formulated the
possible that the items higher the risk of the
in the tests are not presence of
consistent with the inconsistency.
construct needed to
be measured.
It is a measure of
reliability used to
assess the degree to
which different judges
or raters agree in their
It this type of reliability, assessment
a test for a single decisions.
construct is split into
two parts (odd and In this type of reliability
even numbers) and there are three raters
then both parts are that are tasked to rate
given to the same the test in question. If
group at the same the data from the
time. After that the raters are in line with
scores from both parts each other they can
of the test are Inter-Rater agree that the test is
correlated. The Reliability indeed reliable.
difference between
Parallel Form, In this type of reliability
Alternative-Form, and there is only a single
Split-Half Reliability is test administration
that the former have to and test being
be administered with assessed.
time interval while the Furthermore, most of
Split-Half Reliability latter does not need the errors that occur in
to. this type of test are
due to test scoring and
The errors in this type interpretation. While
or reliability are mainly on the other hand, the
due to test statistical tool that
construction since it can be used in this
has a large number of type of reliability is the
items which increases Pearson R.
the risk of having some
of them being
inconsistent with the It is the type of
construct being reliability that refers to
measured. the degree of
correlation among all
In Split-Half Reliability, items on a scale. It
there is only a single means that it deals
test administration with the test item,
and two numbers of Internal Consistency about how consistent
tests (odd and even). the function of the
The statistical tool items is, unlike in the
used in this type of Parallel Form and
reliability is Pearson R. Alternative-Form that
deal with the
performance of the
test taker. It only has
single administration.
It is useful in assessing with five or seven
the homogeneity of choices in place.
the test. Meaning to
say how good is an
item (question) when it
comes to the THE NATURE OF THE TEST
measurement of a
Closely related to considerations concerning the
single variable.
purpose and use of a reliability coefficient are
Kuder-Richardson 20 those concerning the nature of the test itself.
is a statistical tool Included here are considerations such as whether:
used only if the test
1. The test is heterogeneous or
items are highly
homogeneous in nature. To specify further,
homogenous which
means it only homogeneous means it is uniform
measures a single throughout, it only measures one factor or
variable. On the other variable all throughout, it indicates that
hand, it is also used if there is a high possibility that the test has
the difficulty of the a high internal consistency since the items
items is different with are most probably related to each other.
each other. On the other hand, if it is heterogeneous, it
Additionally, it is also means that the test measures more than
used if the test is one trait which also indicates that it is
binary, it means that it highly possible that the internal
is only answerable by
consistency of it is significantly low due to
yes or no.
the presence of multiple types of
questions that aim to measure different
However, if the level of variables.
difficulty of each item
varies from each other 2. The characteristic, ability, trait being
the statistical tool that measured is presumed to be dynamic or
would be needed to static. Dynamic refers to the type of test
use is the Kuder- that is ever changing while static is the
Richardson 21. opposite of it.

Assuming that the variable that is needed


Coefficient Alpha or to be measured is dynamic the best type of
Cronbach Alpha is
reliability to use would be inter item
appropriate for use on
consistency since it is only a single
tests containing non
dichotomous items. It administration. On the other hand, when it
means that it is used if comes to static variables the best type of
the test is designed to reliability would be test-retest since it is
be answered by two or built to measure constructs that do not
more responses, this change frequently.
would be the proper
tool for the job. An
example would be a
Likert Scale which
allows the test taker to
answer the question
3. The range of test scores is or is not ALTERNATIVES TO THE TRUE SCORE
restricted. MODEL

If the variance of either variable in a


correlational analysis is restricted by the DOMAIN SAMPLING THEORY
sampling procedure, then the resulting
correlations tend to be lower. In simpler It states that each item in a test is an
terms, if the participants that were chosen independent trait or ability being
were gathered based on a criterion, there measured. This theory states that if the
is a possibility that the correlation would reliability of the test is low, the only thing
be lower. that is needed to be done is to add more
items to the test (double the number of
On the other hand, If it is inflated by the items to be specific).
sampling procedure, then it tends to be
higher. It means that if the participants The larger the sample, the more likely that
were chosen randomly, there is a the test will represent true characteristics.
possibility that the correlation would be
higher.

4. The test is a speed or a power test. When


we say power test, it means that the speed
of the test taker when it comes to
answering the test, instead, it focuses on
the right answers that he/she managed to
get. Usually, power tests are those that
contain a hard set of questions.

On the other hand, in the speed test the


factor being considered is the time limit. It
usually has a uniform level of difficulty.
The type of reliability to use could be
Alternative-Forms Reliability or Test-
Retest Reliability since they mainly focus
on the performance of the test taker.

5. The test is or is not criterion-referenced


test. If the test is criterion-referenced, the
type of reliability that is needed to be used
should be Inter-Rater Reliability, Parallel
Form Reliability, or Alternative-Form
Reliability. While on the other hand, if the
test is not criterion-referenced, all the
types of reliability can be used, it would
only depend on the construct being
measured.

You might also like