Professional Documents
Culture Documents
FACULTY OF EDUCATION
CURRICULUM AND EDUCATIONAL TECHNOLOGY
SURABAYA STATE UNIVERSITY
2024
PREFACE
Praise be to Allah SWT who has given His grace and guidance so that I
can complete this paper assignment entitled "Types, Accuracy and
Standardization Test Errors" on time. The purpose of writing this paper is to
fulfill the assignment in the Learning Evaluation course. In addition, this paper
also aims to add insight into determining the objectives of learning analysis for
readers and also for writers. I would also like to thank all those who have shared
some of their knowledge so that I can complete this paper. I realize that this
paper is far from perfect. Therefore, constructive criticism and suggestions will
be awaited for the perfection of this paper.
1
Table of Contents
PREFACE............................................................................................................................................... 1
Table of Contents....................................................................................................................................2
Theoretical Basis......................................................................................................................................3
CONCLUSION....................................................................................................................... 13
BIBLIOGRAPHY....................................................................................................................14
2
Theoretical Basis
3
The History Of Academic Aptitude Testing
The development of tests to predict school achievement began
in France at the beginning of the twentieth century. France had
embarked on a program of compulsory education, and the minister of
public instruction realized that not all French children had the
cognitive or mental potential to be able to benefit from instruction in
regular classes. “Special”classes were to be established for the
instruction of such children. Admission to these classes was to be
dependent on the results of a medical and psychological evaluation.
In 1905 Alfred Binet and his assistant, Theo Simon, were
commissioned to develop such a test. The aim of their test was to
measure a trait that would predict school achievement.
By the time of his death in 1911, Binet’s scale was widely used
and heralded as an “intelligence” test. English translations of the 1908
and 1911 revisions were made, and in 1916 a Stanford University
psychologist, Louis Terman, standardized the Binet test on American
children and adults. That version of the test became known as the
Stanford–Binet Intelligence Scale or IQ test, and it was revised and/or
restandardized again in 1937, 1960, 1972, 1985, and 2003. What had
begun as a test designed to predict school achievement evolved into a
test of intelligence. Since Binet’s seminal work, several other
intelligence or IQ tests have been developed. Some, like the
Stanford–Binet, are designed to be administered individually (e.g.,
Wechsler Intelligence Scale for Children–IV, Wechsler Adult
Intelligence Scale–IV, Kaufman Assessment Battery for Children–II,
Slosson Intelligence Test); others are designed for group
administration (e.g., Cognitive Abilities Test, Otis–Lennon Mental
Ability Test, Kuhlmann–Anderson Intelligence Tests). Although each
of these tests is different from the Binet, each also has similarities and
correlates strongly with the Binet. Recall that tests that correlate
strongly measure much the same thing.
4
one of the recent conceptions of intelligence (Sternberg, 1989,
2007) is that it can be defined by its underlying components and
altered through instruction. Thus, traits previously thought inherited
and unalterablecould be taught.
5
A. Projective Personality Tests
*Rorschach Inkblot Technique, The Rorschach test is
published by Hans Huber Medical Publisher, Berne, Switzerland. It
consists of 10 cards or plates. Each card contains an inkblot, some
black and white, some colored. Examinees are asked to describe what
they “see” in the ambiguous blot. Responses are scored according to
location, content, and a variety of other factors, including whether
form or color was used to construct the image, whether movement is
suggested, and whether shading or texture was considered. Scoring is
complex, but acceptable validity is evident when properly trained
scorers are used. The Rorschach has been one of the most popular
projective tests used by clinical psychologists. In the hands of a
skilled clinician, it can yield a surprising variety of information.
*Thematic Apperception Test (TAT), The TAT is published by
the Harvard University Press and is designed for individuals aged 10
through adult. Adaptations of the test for younger children (Children’s
Apperception Test) and senior citizens (Senior Apperception Test) are
also available. The subject is presented with a series of pictures
(usually 10 or 12 of the total of 30) and asked to make up a story to fit
each picture. The pictures vary in degree of structure and ambiguity.
The record of stories is then examined to determine the projection of
the subject’s personality, as indicated by recurrent behavioral themes,
needs, perceived pressures, and so on. Interpretation is often complex
and requires an appropriately trained psychologist or psychiatrist.
6
B. ACCURACY OF STANDARDIZED TEST PRECISION
7
C. STANDARD ERROR OF MEASUREMENT
8
*Error Of Measurement, The standard error of measurement of
a test (abbreviated as Sm) is the standard deviation of the error scores
of a test. In the following calculations, Sm is the standard deviation of
the error score column. It is determined in the same way as
determining the standard deviation of any score distribution (see
Chapter 14). Review the following calculation to confirm this. We
will use the error scores from Table 18.1: 3, -7, -2, 5, 4, -3.
Step 1: Determine the average.
M = ΣX/N
N=6=0
Step 2: Subtract the average from each error score to get the deviation
score. Square each deviation score and sum the squared deviations.
X - M = x x^2
+3 - 0 = 3 9
-7 - 0 = -7 49
-2 - 0 = -2 4
+5 - 0 = 5 25
+4 - 0 = 4 16
-3 - 0 = -3 9
Σx^2 = 112
Step 3: Put the sum of x^2 into the formula and solve for the standard
deviation.
Error Score SD =
Σx^2 / N = 112 / √18.67 = 4.32 = Sm
The standard deviation of the error score distribution, also
known as the standard error of measurement, is 4.32. If we could
know what the error score is for each test we take, we could calculate
Sm in this way. But, of course, we never know these error scores. If
you followed along here, your next question should be, "But how can
you determine the standard deviation of the error score if you never
know the error score?"
9
Fortunately, a fairly simple statistical formula can be used to estimate
this standard deviation (Sm) without having to actually know the error
score:
Sm = SD√1 - r
where r is the reliability of the test and SD is the standard deviation of
the test.
*Using Measurement Standard Error, Error scores are
assumed to be random. As such, they negate each other. That is, the
score obtained is inflated by random errors in the same amount as the
score deflated by the errors. Another way of saying this is that the
average error score for a test is zero. The distribution of error scores is
also important, as it approximates the normal distribution enough for
us to use the normal distribution to represent it. In summary, we know
that error scores (1) are normally distributed, (2) have a mean of zero,
and (3) have a standard deviation referred to as the standard error of
measurement (Sm).
The distribution of error scores is a normal distribution. This is
important because, as you learned in Chapter 14, the normal
distribution has characteristics that allow us to make decisions about
scores that fall between, above, or below various points in the
distribution. We can do so because a fixed percentage of scores fall
between various score values in a normal distribution. Figure 18.3
should refresh your memory.
Therefore we can see that 68% of the error scores for the test will be
no more than
10
FIGURE 18.1 Distribution of error scores.
4.32 points higher or 4.32 points lower than the true score. That
is, if there are 100 scores obtained on this test, 68 of those scores will
not be "wrong" of the true score by more than 4.32 points, plus or
minus from the score obtained. Sm then, tells us about the distribution
of the scores obtained around the true score. By knowing a person's
true score, we can predict what his/her earned score is likely to be.
Careful readers might think, "That's not very useful information. We
can never know what someone's true score is, only their earned
11
score." This is true. As test users, we only work with the scores
obtained.
However, we can follow our logic in reverse. If 68% of the
earned score is within 1 Sm of their true score, then 68% of the true
score must be within 1 Sm of their earned score. Strictly speaking,
this reverse logic is somewhat inaccurate when we consider individual
test scores. However, across all test scores, this will be true 99% of
the time (Gullikson, 1987).
12
CONCLUSION
13
BIBLIOGRAPHY
14