You are on page 1of 4

t Tests see Catalogue of Parametric test: either lower bounds to reliability can be used, or

one may resort to hypotheses about the nature of the


Tests test parts.
Evaluating lower bounds to the reliability, such
as Guttman’s λ3 [6], better known as coefficient
alpha [4] has gained wide popularity. A lower bound
that is nearly always better than alpha is Guttman’s
λ4 . It is the highest alpha that can be obtained by
Tanimoto Coefficient see splitting up the items in two parts (not necessar-
Proximity Measures ily of equal numbers) and treating those two parts
as novel ‘items’. Jackson and Agunwamba [8] pro-
posed the greatest lower bound (glb) to reliability.
It exceeds all conceivable lower bounds by using the
available information implied by the observed covari-
ance matrix exhaustively. A computational method
Tau-Equivalence see Conditional for the glb has been proposed by Bentler & Wood-
ward [3], also see [19]. Computation of the glb has
Standard Errors of Measurement been implemented in EQS 6.
When lower bounds are high enough, the reliabil-
ity has been shown adequate by implication. How-
ever, when lower bounds are low, they are of limited
value. Also, some lower bounds to reliability involve
Tau-Equivalent and a considerable degree of sampling bias. To avoid
these problems, it is tempting to look to alternative
Congeneric Measurements approaches, by introducing hypotheses on the nature
of the test parts, from which the reliability can be
determined at once. Two of such hypotheses are well-
It is a well-known fact that the reliability of a test, known in classical test theory.
defined as the ratio of true score to observed score
variance, cannot generally be determined from a
single test administration, but requires the use of a Tau Equivalence
parallel test. More often than not, parallel tests are not
available. In such cases, two approaches are popular The first hypothesis is that of (essentially) tau-
to obtain indirect information on the reliability of the equivalent tests. Test parts X1 , . . . , Xk are essentially
1992 Tau-Equivalent and Congeneric Measurements

tau-equivalent when for i, j = 1, . . . , k, Feldt [5] offered coefficients that could be evaluated
without having access to powerful computers. They
Tj = Ti + aij . (1) were fully aware that these coefficients would be
supplanted by common factor analysis (see History
This implies that the true scores of the test parts of Factor Analysis: A Psychological Perspective)
are equal up to an additive constant. When the based coefficients by the time large computers would
additive constants are zero, the test parts are said be generally available. Nowadays, even the smallest
to be tau-equivalent. Novick and Lewis [14] have of personal computers can evaluate the reliability of
shown that coefficient alpha is the reliability (instead a test in the framework of common factor analysis,
of merely a lower bound to it) if and only if the test assuming that the one-factor hypothesis is true. For
parts are essentially tau-equivalent. instance, McDonald [13], also see Jöreskog [9] for a
Unfortunately, the condition for essential tau- similar method, proposed estimating the loadings on
equivalence to hold in practice is prohibitive: All a single factor and evaluating the reliability as the
covariances between test parts must be equal. This ratio of the squared sum of loadings to the test vari-
will only be observed when k = 2 or with con- ance. When k = 3, this yields Kristof’s coefficient.
trived data. Moreover, the condition of equal covari- Coefficients like Kristof’s and McDonald’s have been
ances is necessary, but not sufficient for essential considered useful alternatives to lower bounds like
tau-equivalence. For instance, let Y1 , Y2 and Y3 be glb, because they aim to estimate, rather than under-
three uncorrelated variables with zero means and unit estimate, reliability, and they lack any reputation of
variances, and let the test parts be X1 = Y2 + Y3 , sampling bias. However, much like the hypothesis of
X2 = Y1 + Y3 , and X3 = Y1 + Y2 . Then, the covari- essential tau-equivalence, the one-factor hypothesis is
ance between any two test parts is 1, but the test problematic.
parts are far from being essentially tau-equivalent,
which shows that having equal covariances is neces-
sary but not sufficient for essential tau-equivalence. The Hypothesis of Congeneric Tests is
Because the necessary condition will never be satis-
Untenable for k > 3, and Undecided
fied in practice, coefficient alpha is best thought of
as a lower bound (underestimate) to the reliability of Otherwise
a test.
The hypothesis of congeneric tests relies on the
existence of communalities to be placed in the
diagonal cells of the item covariance matrix, in
Congeneric Tests order to reduce the rank of that matrix to one.
A weaker, and far more popular hypothesis, is that of The conditions under which this is possible have
a congeneric test, consisting of k test parts satisfying been known for a long time. Spearman [17] already
noted that unidimensionality is impossible (except
Tj = cij Ti + aij , (2) in contrived cases) when k > 3. Accordingly, when
k > 3, factor analysis with only one common factor
which means that the test parts have perfectly cor- will never give perfect fit. More generally, Wilson
related true scores. Equivalently, the test parts are and Worcester [23], Guttman [7], and Bekker and
assumed to fit a one-factor model. Essential tau equiv- De Leeuw [1] have argued that rank reduction of
alence is more restricted because it requires that the a covariance matrix by communalities does not
weights cij in (2) are unity. For the case k = 3, carry a long way. Shapiro [15] has proven that the
Kristof has derived a closed-form expression for the minimal reduced rank that can possibly be achieved
reliability of the test, based on this hypothesis. It will be at or above the Ledermann bound [11]
is always at least as high as alpha, and typically almost surely. It means that the minimal reduced
better [10]. Specifically, there are cases where it coin- rank is almost surely at or above 1 when k = 3,
cides with or even exceeds the greatest lower bound at or above 2 when k = 4, at or above 3 when
to reliability [20]. k = 5 or 6, and so on. The notion of ‘almost
For k > 3, generalized versions of Kristof’s coef- surely’ reflects the fact that, although covariance
ficients have been proposed. For instance, Gilmer and matrices that do have lower reduced rank are easily
Tau-Equivalent and Congeneric Measurements 1993

constructed, they will never be observed in practice. Bias Correction of the glb
It follows that the hypothesis of congeneric tests
is nearly as unrealistic as that of essential tau- Although the glb seems superior to single-factor-
equivalence. It may be true only when there are three based coefficients of reliability, when the test is
or fewer items. hypothesized to be unidimensional, this does not
Even when reduction to rank 1 is possible, this is mean that the glb must be evaluated routinely for
not sufficient for the hypothesis to be true: We merely the single administration of an arbitrary test. The
have a necessary condition that is satisfied. The glb has gained little popularity, mainly because of
example of X1 , X2 , and X3 with three uncorrelated the sampling bias problem. Bias correction methods
underlying factors Y1 , Y2 , and Y3 , given above in the are under construction [12, 22], but still have not
context of tau-equivalence, may again be used to reached the level of accuracy required for practical
demonstrate this: There are three underlying factors, applications. The problem is especially bad when the
yet communalities that do reduce the rank to 1 do number of items is large relative to sample size. Until
exist (being 1, 1, and 1). The bottom line is that the these bias problems are over, alpha will prevail as the
hypothesis of congeneric tests cannot be rejected, but lower bound to reliability.
still may be false when k = 3 or less, and it has to be
rejected when k > 3. ‘Model-based coefficients are
not useful if the models are not consistent with the
empirical data’ [2]. Reliability coefficients based on
Reliability versus Unidimensionality
the single-factor hypothesis are indeed a case where
Reliability is often confused with unidimensional-
this applies.
ity. A test can be congeneric, a property of the true
score parts of the items, yet have large error vari-
ances, a property of the error parts of the items, and
Sampling Bias the reverse is also possible. Assessing the degree
of unidimensionality is a matter of assessing how
Lower bounds to reliability do not rest on any closely the single factor fits in common factor anal-
assumption other than that error scores of the test
ysis. Ten Berge and Sočan [20] have proposed a
parts correlate only with themselves and with the
method of expressing unidimensionality as the per-
observed scores they belong with. On the other hand,
centage of common variance explained by a single
lower bounds do have a reputation for sampling bias.
factor in factor analysis, using the so-called Minimum
Whereas coefficient alpha tends to slightly underesti-
Rank Factor Method of Ten Berge and Kiers [18].
mate the population alpha [21, 24], Guttman’s λ4 , and
However, this is a matter of taste and others pre-
the greatest lower bound in particular, may grossly
fer goodness-of-fit measures derived from maximum-
overestimate the population value when computed
likelihood factor analysis.
in small samples. For instance, when k = 10 and
the population glb is 0.68, its average sample esti-
mate may be as high as 0.77 in samples of size References
100 [16].
It may seem that one-factor-based coefficients
[1] Bekker, P.A. & De Leeuw, J. (1987). The rank
have a strong advantage here. But this is not true. of reduced dispersion matrices, Psychometrika 52,
When k = 3, Kristof’s coefficient often coincides 125–135.
with glb, and for k > 3, numerical values of McDon- [2] Bentler, P.M. (2003). Should coefficient alpha be
ald’s coefficient are typically very close to the glb. In replaced by model-based reliability coefficients? Paper
fact, McDonald’s coefficient demonstrates the same Presented at the 2003 Annual Meeting of the Psychome-
sampling bias as glb in Monte Carlo studies [20]. tric Society, Sardinia.
[3] Bentler, P.M. & Woodward, J.A. (1980). Inequalities
Because McDonald’s and other factor analysis–based
among lower bounds to reliability: with applications to
coefficients behave very similarly to the glb and have test construction and factor analysis, Psychometrika 45,
the same bias problem, and, in addition, rely on a 249–267.
single-factor hypothesis, which is either undecided [4] Cronbach, L.J. (1951). Coefficient alpha and the internal
or false, the glb is to be preferred. structure of tests, Psychometrika 16, 297–334.
1994 Teaching Statistics to Psychologists

[5] Gilmer, J.S. & Feldt, L.S. (1983). Reliability estimation [23] Wilson, E.B. & Worcester, J. (1939). The resolution of
for a test with parts of unknown lengths, Psychometrika six tests into three general factors, Proceedings of the
48, 99–111. National Academy of Sciences 25, 73–79.
[6] Guttman, L. (1945). A basis for analyzing test-retest [24] Yuan, K.-H. & Bentler, P.M. (2002). On robustness
reliability, Psychometrika 10, 255–282. of the normal-theory based asymptotic distributions of
[7] Guttman, L. (1958). To what extent can communalities three reliability coefficient estimates, Psychometrika 67,
reduce rank, Psychometrika 23, 297–308. 251–259.
[8] Jackson, P.H. & Agunwamba, C.C. (1977). Lower
bounds for the reliability of the total score on a JOS M.F. TEN BERGE
test composed of non-homogeneous items: I. Algebraic
lower bounds, Psychometrika 42, 567–578.
[9] Jöreskog, K.G. (1971). Statistical analysis of sets of
congeneric tests, Psychometrika 36, 109–133.
[10] Kristof, W. (1974). Estimation of reliability and true
score variance from a split of the test into three arbitrary

[11]
parts, Psychometrika 39, 245–249.
Ledermann, W. (1937). On the rank of reduced correla-
Teaching Statistics to
tion matrices in multiple factor analysis, Psychometrika
2, 85–93.
Psychologists
[12] Li, L. & Bentler, P.M. (2004). The greatest lower
bound to reliability: Corrected and resampling estima-
tors. Paper presented at the 82nd symposium of the In the introduction to his 1962 classic statistical text,
Behaviormetric Society, Tokyo. Winer [7] describes the role of the statistician in a
[13] McDonald, R.P. (1970). The theoretical foundations of
research project as similar to that of an architect; that
principal factor analysis, canonical factor analysis, and
alpha factor analysis, British Journal of Mathematical is, determining whether the efficacy of a new drug
and Statistical Psychology 23, 1–21. is superior to that of competing products is similar
[14] Novick, M.R. & Lewis, C. (1967). Coefficient alpha and to designing a building with a particular purpose
the reliability of composite measurements, Psychome- in mind and in each case, there is more than one
trika 32, 1–13. possible solution. However, some solutions are more
[15] Shapiro, A. (1982). Rank reducibility of a symmetric
elegant than others and the particulars of the situation,
matrix and sampling theory of minimum trace factor
analysis, Psychometrika 47, 187–199. whether they are actual patient data or the size and
[16] Shapiro, A. & Ten Berge, J.M.F. (2000). The asymptotic placement of the building site, place boundaries on
bias of minimum trace factor analysis, with applications what can and cannot be accomplished.
to the greatest lower bound to reliability, Psychometrika What is the best way to teach the science and art
65, 413–425. of designing and conducting data analysis to today’s
[17] Spearman, C.E. (1927). The Abilities of Man, McMillan,
graduate students in psychology? Perhaps history
London.
[18] Ten Berge, J.M.F. & Kiers, H.A.L. (1991). A numer- can be our guide, and so we begin by first asking
ical approach to the exact and the approximate mini- what graduate education in statistics was like when
mum rank of a covariance matrix, Psychometrika 56, today’s senior faculty were students, then ask what
309–315. is current common practice, and finally ask what the
[19] Ten Berge, J.M.F., Snijders, T.A.B. & Zegers, F.E. future might hold for tomorrow’s graduate students.
(1981). Computational aspects of the greatest lower
I propose the following:
bound to reliability and constrained minimum trace
factor analysis, Psychometrika 46, 357–366.
[20] Ten Berge, J.M.F. & Sočan, G. (2004). The great-
• Graduate training in statistics has greatly changed
est lower bound to the reliability of a test and the over the last few decades in ways that are both
hypothesis of unidimensionality, Psychometrika 69, helpful and harmful to students attempting to
611–623. master statistical methodology.
[21] Van Zijl, J.M., Neudecker, H. & Nel, D.G. (2000). On • The major factor in this change is the
the distribution of the maximum likelihood estimator of development of computerized statistical packages
Cronbach’s alpha, Psychometrika 65, 271–280.
[22] Verhelst, N.D. (1998). Estimating the reliability of
(see Software for Statistical Analyses) which,
a test from a single test administration, (Measure- when used in graduate education, cause students
ment and Research Department Report No. 98-2), trained in experimental design to be more
Arnhem. broadly but less thoroughly trained.

You might also like