Professional Documents
Culture Documents
The SEM can be used to create confidence intervals; those are also called
confidence bands. Computer-generated score reports often use these
confidence bands.
Appropriate Units for SEM
The standard error of measurement should be expressed in the score units
used for interpretation. If the interpretation employs normed scores the
raw SEM scores need to be converted.
Standard Errors: Three Types
The SEM is different to the standard error of the mean and the standard
error of estimate. The standard error of measurement is the standard
deviation from a hypothetical population of observed scores distributed
around the true score of an individual. The standard error of the mean is
the standard deviation of a hypothetical population of sample means
around the population mean. The standard error of estimate is the standard
deviation of actual Y scores around the predicted Y scores when predicted
from X.
Some Special Issues in Reliability
Reliability in Interpretive Reports
Narrative reports are not readily adapted to the tools of reliability analysis.
The impression could arise that reliability is not an issue, but it always is.
As the reader of a narrative report it is important must ensure to be
familiar with the reliability information given about the test. Finally,
every narrative report should include the concept of SEM.
Reliability of Subscores and Individual Items
Information must be provided for the score that is actually being
measured. One cannot assume for example that individual items have the
same reliability as the total scores of a test.
Reliability in Item Response Theory
The standard error in IRT is often referred to as an index of the precision
of measurement. The SE compared to the SEM in CTT is not dependent
on the homogeneity or heterogeneity of the test items.
Generalizability Theory
GT is an attempt to assess many sources of unreliability at the same time.
In GT the true score is referred to as a universe score or domain score.
5
The person’s universe score is the average score across all occasions,
forms and scorers. Generalizability theory is divided into G-studies and
D-studies. The G-study analyses the components of variance, including
interactions. The D-study uses results of the G-study to decide how the
measurement might be improved by changes in one of the components.
GT offers an exceptionally useful framework for thinking about the
reliability of measures, but is not widely used in practical examinations as
they are complicated and need lots of time to be conducted.
Factors Affecting Reliability Coefficients
The fact that correlation is a matter of relative position rather than
absolute scores is not a significant concern for reliability. Curvilinearity is
also not an issue for reliability data. Heteroscedasticity is very much a
problem for the SEM. Group variability is also often a problem when
interpreting reliability data.
How High Should Reliability Be?
Summarized, reliability is always important. However, more important
than reliability is validity. It is possible to have a test with high reliability
which is not valid at all.