Professional Documents
Culture Documents
97-118
Printed in Great Britain. All rights reserved
CHAPTER 1
ABSTRACT
The recent rise of a number of new learning strategies has basically changed
the meaning of measurement in education and made new demands on the
construction, scoring, and analysis of educational tests. Educational
measurements s a t i s f y i n g these demands are usually called c r i t e r i o n - r e f e r e n c e d ,
while t r a d i t i o n a l measurements are often known as n o r m - r e f e r e n c e d .
97
98 Wire J. van der Linden
The above f i v e approaches were mentioned because they have been most
p o p u l a r so f a r or indicate promising developments. For a more e x h a u s t i v e
review, we r e f e r to Nitko (1980).
As noted above, random sampling of tests from item domains has been the
most p o p u l a r way of c r i t e r i o n - r e f e r e n c i n g test scores. However, when tests
are sampled and the concern is with domain scores and not with t r u e test
scores, classical test t h e o r y does not hold (unless all items in the domain
have equal d i f f i c u l t y ) . Brennan and Kane (1977a) proposed to use
g e n e r a l i z i b i l i t y t h e o r y when domain sampling takes place and transformed
L i v i n g s t o n ' s r e l i a b i l i t y coefficient into what t h e y called an index of
d e p e n d a b i l i t y . At the same time t h e y presented a modification of this index
which can be i n t e r p r e t e d as a signal-noise ratio f o r domain-referenced
measurements, while later two general agreement coefficients were given from
which these indices can be d e r i v e d as special instances (Brennan & Kane,
1977b; Kane & Brennan, 1980). A summary of these developments is given in
Brennan (1980).
Popham and Husek's diagnosis of the role of test score variance in the
analysis of c r i t e r i o n - r e f e r e n c e d measurement has been a h o t l y - d e b a t e d issue
(Haladyna, 1974a, 1974b; Millman E, Popham, 1974, Simon, 1969; Woodson,
1974a, 1974b). Our own opinion is t h at , although it has given rise to many
Criterion--Referenced Measurement 103
w o r t h w h i l e developments and comes close to a fundamental problem in test
t h e o r y , it is erroneous as f a r as the classical test model is concerned. A p a r t
from the requirement of f i n i t e variance, this model does not contain any
assumption with respect to score variance. (A full statement of these
assumptions can be found in, e . g . , Lord ~, Novick, 1968, section 3 . 1 . ) Low
v a r i a b i l i t y of test scores in c r i t e r i o n - r e f e r e n c e d measurement can t h e r e f o r e
never i n v a l i d a t e the classical test model. Latent t r a i t t h e o r y has called
attention to the more fundamental problem of p o p u l a t i o n - d e p e n d e n t item and
test parameters and offers models in which these are replaced by parameters
being not only v a r i a n c e - i n d e p e n d e n t b u t in d e p e n d e n t of any d i s t r i b u t i o n a l
characteristic ( e . g . , Birnbaum, 1968; Lord, 1980; Wright ~- Stone, 1979; f o r
an informal i n t r o d u c t i o n , see van der Linden, 1978b). In fact, Popham and
Husek's paper alludes to this problem but w r o n g l y claims it as an e x c l u s i v e
problem of c r i t e r i o n - r e f e r e n c e d measurement.
MASTERY DECISIONS
(1979a), and Wilcox and Harris (1977). For a discussion of the differences
between continuum and state models, we r e f e r to Meskauskas (1976) and van
der Linden (1978a).
Also relevant to the issue of mastery setting with continuum models are
techniques f o r setting mastery scores. These techniques all somehow
t r a n s l a t e the learning objectives into a mastery score on the t r u e score
continuum u n d e r l y i n g the test. Hence, they should be d i s t i n g u i s h e d from the
d e c i s i o n - t h e o r e t i c approaches mentioned above w h i c h , once the mastery score
on the continuum has been set, indicate how to optimally select the c u t - o f f
score on the test. Several techniques of setting mastery scores have been
proposed. Reviews of these techniques are g i v e n , e . g . , in Glass (1978),
Hambleton (1980), Jaeger (1979), and Shepard (1979, 1980). An approach
accounting f o r possible u n c e r t a i n t y in setting standards is given in de
G r u i j t e r (1980).
CONCLUDING REMARKS
REFERENCES
Block, J.H. (Ed.) Mastery Learning: Theory and Practice. New York:
Holt, Rinehart and Winston, 1971b.
Brennan, R.L. & Kane, M.T. An index of dependability for mastery tests.
Journal of Educational Measurement, I#, 227-289, 1977a.
Cox, R.C. & Vargas, J.S. A comparison of item selection techniques for
n o r m - r e f e r e n c e d and c r i t e r i o n - r e f e r e n c e d tests. Paper presented at the
Annual Meeting of the National Council on Measurement in Education,
Chicago, Illinois, 1966. (EDRS No. ED 010 517)
Cronbach, L.J. & Gleser, G.C. Psychological tests and personnel decisions,
(2nd E d . ) . Urbana, II1.: U n i v e r s i t y of Illinois Press, 1965.
Flannagan, J.C. Functional education for the seventies. Phi Delta Kappan,
zig, 27-32, 1967.
Koppelaar, H., van der Linden, W.J., & Mellenbergh, G.J. A computer
program for classification proportions in dichotomous decisions based on
dichotomously scored items. Tijdschrift voor Onderwijsresearch, 2, 32-37,
1977.
Mellenbergh, G.J. ~, van der Linden, W.J. The internal and external
optimality of decisions based on tests. Applied Psychological Measurement,
3, 257-273, 1979.
van der Linden, W.J. Het klassieke testmodel , latente trek modellen en
evaluatie-onderzoek (VOR Publikatie nr. 7). Amsterdam, The Netherlands:
V e r e n i g i n g r o a r Onderwijsresearch, 1978b.
van d e r Linden, W.J. Passing score and length of a mastery test. Paper
presented at the 11th European Mathematical Psychology Group Meeting,
Liege, Belgium, September 3-4, 1980b.
van der Linden, W.J. Simple estimates f o r the simple latent class mastery
testing model. Paper presented at the Fourth I n t e r n a t i o n a l Symposium on
Educational T e s t i n g , A n t w e r p , Belgium, June 24-27, 1980c.
van der Linden, W.J. The use' of moment estimators f o r mixtures of two
binomials with one known success parameter. Submitted f o r p u b l i c a t i o n ,
1981e.
Criterion - Referenced Measurement 117
van der Linden, W.J. ~, Mellenbergh, G.J. Optimal c u t t i n g scores using a
linear loss f u n c t i o n . Applied Psychological Measurement, 1, 593-599, 1977.
van der Linden, W.J. E, Mellenbergh, G.J. Coefficients for tests from a
decision t h e o r e t i c point of view. Applied Psychological Measurement, 2,
119-134, 1978.
Wilcox, R.R. A note on the length and passing score of a mastery test.
Journal of Educational Statistics, I, 359-364, 1976.
Wright, B.D. & Stone, M.H. Best test design: A handbook for Rasch
Measurement. Chicago, II1.: MESA Press, 1979.