You are on page 1of 25

VOL. 56, No.

2 MARCH, 1959

Psycholo gical Bulletin

Northwestern University
University of Chicago

In the cumulative experience with insofar as they are to be distinguished

measures of individual differences from reliability.
over the past 50 years, tests have 2. For the justification" of novel
, been accepted as valid or discard~d trait measures, for the validation of
as invalid by research experiences of test interpretation, or for the estab-
many sorts. The criteria suggested in lishment of construct validity, dis-
this paper are all to be found in such criminant validation as well as con-
cumulative evaluations, as well as in vergent validation is required. Tests
the recent discussions of validity. can be invalidated by too high cor-
These criteria are clarified and imple.. relations with other tests from which
men ted when considered jointly in they were intended to differ.
the context of a multi trait-multi- 3. Each test or task employed for
method matrix. Aspects of the valida- measurement purposes is a trait..
tional process receiving particular method unit, a union of a particular
emphasis are these: trait content with measurement pro..
1. Validation is typically conver- cedures not specific to that content.
gent, a confirmation by independent The systematic variance among test
measurement procedures. Independ- scores can be due to responses to the
ence of methods is a common denom- measurement features as well as re-
inator among the major types of sponses to the trait content.
validity (excepting content validity) 4. In order to examine discrim-
inant validity, and in order to esti-
mate the relative contributions of
1 The new data analyses reported in this
paper were supported by funds from the trait and method variance, more than
Graduate School of Northwestern University one trait as well as more than one
and by the Department of Psychology of the method must be employed in the vali-
University of Chicago. We are also indebted dation process. In many instances it
to numerous colleagues for their thoughtful will be convenient to achieve this
criticisms and encouragement of an earlier
draft of this paper, especially Benjamin S. through a multitrait-multimethod
Bloom, R. Darrell Bock, Desmond S. Cart- matrix. Such a matrix presents all of
wright, Loren J. Chapman, Lee J. Cronbach, the intercorrelations resulting when
Carl P. Duncan, Lyle V. Jones J Joe Kamiya, each of several traits is measured by
Wilbur L. Layton, Jane Loevinger, Paul E.
Meehl, Marshall H. Segall, Thornton B. Roby, each of several methods.
Robert C. Tryon, Michael Wertheimer, and To illustrate the suggested valida..
Robert F. Winch. tional process, a synthetic example is

Method 1 -Method 2 Method 3

Traits B~ AI Ba Ca

IVfethod 1 .51 (.89)

.38 .37 (.76)


Method 2 n~

A:J .67 ...." .-1-2

.... ----------,
.331 (.94)

Method 3 B~
,............. '...........
:.43............. .66 . . . , .34,'
: .67 (.92)
I "...... -......, I
I ........"
Ca .32........ .58 .58 .60 (.85)
.-. ...

Note.-The validity diagonals are the three sets of italicized values. The reliability diagonals are the three sets
of values in parentheses. Each heterotrait-monomethod triangle is enclosed by a solid line. Each heterotrait-
heteromethod triangle is enclosed by a broken line.

presented in Table 1. This illustra- heteromethod triangles are not iden-

tion involves three different traits, tical.
each measured by three methods, In terms of this diagram, four as-
generating nine separate variables. It pects bear upon the question of valid-
will be convenient to have labels for ity. In the first place, the entries in
various regions of the matrix, and the validity diagonal should be sig-
such have been provided in Table 1. nificantly different from zero and
The reliabilities will be spoken of in sufficiently large to encourage further
terms of three reliability diagonals, examination of validity. This re-
one for each method. The reliabilities quirement is evidence of convergent
could also be designated as the mono.. validity. Second, a validity diagonal
trait-monomethod values. Adjacent value should be higher than the val-
to each reliability diagonal is the ues lying in its column and row in the
heterotrait-monomethod triangle. The heterotrait..heteromethod triangles.
reliability diagonal and the adjacent That is, a validity value for a variable
heterotrai t-monomethod triangle should be higher than the correlations
make up a monomethod block. A heter- obtained between that variable and
omethod block is made up of a validity any other variable having neither
diagonal (which could also be desig- trait nor method in common. This
nated as monotrait-heteromethod requirement may seem so minimal
values) and the two heterotrait-hetero.. and so obvious as to not need stating,
method triangles lying on each side of yet an inspection of the Ii terature
it. Note that these two heterotrait.. shows that it is frequently not met,
and may not be met even when the such phrases as 4'external variable,"
validity coefficients are of substantial "criterion performance," "behavioral
size. In Table 1, all of the validity criterion" (American Psychological
values meet this requirement. A Association, 1954, pp. 13-15) used in
third common-sense desideratum is connection with concurrent and pre-
that a variable correlate higher with dictive validity. For construct valid-
an independent effort to measure the ity it has been stated thus: "Numer-
same trait than with measures de- ous successful predictions dealing
signed to get at different traits which with phenotypically diverse 'criteria'
happen to employ the same method. give greater weight to the claim of
For a given variable, this involves construct validity than do ... pre-
comparing its values in the validity dictions involving very similar be-
diagonals with its values in the heter- havior" (Cronbach & Meehl, 1955, p.
otrait-monomethod triangles. For 295). The importance of independ-
variables Al , B 1, and CI , this require- ence recurs in most discussions of
ment is met to some degree. For the proof. For example, Ayer, discussing
other variables, A2, As etc., it is not a historian's belief about a past
met and this is probably typical of event, says "if these sources are
the usual case in individual differ- numerous and independent, and if
ences research, as will be discussed in they agree with one another, he will
what follows. A fourth desideratum be reasonably confident that their ac-
is that the same pattern of trait in- count of the matter is correct" (Ayer,
terrelationship be shown in all of the 1954, p. 39). In discussing the man-
heterotrait triangles of both the mon~ ner in which abstract scientific con-
omethod and heteromethod blocks. cepts are tied to operations, Feigl
The hypothetical data in Table 1 speaks of their being llfixed" by Sltri-
meet this requirement to a very angulation in logical space" (Feigl.
marked degree, in spite of the dif- 1958, p. 401).
ferent generalleveJs of correlation in- Independence is, of course, a mat-
volved in the several heterotrait tri- ter of degree, and in this sense, relia-
angles. The last three criteria pro- bility and validity can be seen as re-
vide evidence for discriminant va- gions on a continuum. (Cf. Thur-
lidity. stone, 1937, pp. 102-103.) Reliability
Before examining the multitrait- is the agreement between two efforts
multimethod matrices available in to measure the same trait through
the literature, some explication and maximally similar methods. Validity
justification of this complex of re- is represented in the agreement be-
quirements seems in order. tween two attempts to measure the
Convergence of independent methods: same trait through maximally differ-
the distinction between reliability and ent methods. A split-half reliability
validity. Both reliability and validity is a little more like a validity coeffi-
concepts require that agreement be- cient than is an immediate test-retest
tween measures be demonstrated. A reliability, for the items are not quite
common denominator which most identical. A correlation between
validity concepts share in contradis- dissimilar subtests is probably a reli-
tinction to reliability is that this ability measure, but is still closer to
agreement represent the convergence the region called validity,
of independent approaches. The con- Some evaluation of validity can
cept of independence is indicated by take place even if the two methods

are not entirely independent. In method triangles are as high as those

Table 1, for example, it is possible in the validity diagonal, or even
that Methods 1 and 2 are not en- where within a monomethod block,
tirely independent. If underlying the heterotrait values are as high as
Traits A and B are entirely inde- the reliabilities. Loevinger, GIeser,
pendent, then the .10 minimum cor- and DuBois (1953) have emphasized
relation in.. the heterotrait-hetero- this requirement in the development
method triangles may reflect method of maximally discriminating subtests.
covariance. What if the overlap of When a dimension of personality is
method variance were higher? All hypothesized, when a construct is
correlations in the heteromethod proposed, the proponent invariably
block would then be elevated, includ.. has in mind distinctions between the
ing the validity diagonal. 1'he hetero- new dimension and other constructs
method block involving Methods 2 already in use. One cannot define
and 3 in Table 1 illustrates this. The without implying distinctions, and
degree of elevation of the validity the verification of these distinctions
diagonal above the heterotrait-heter- is an important part of the valida-
omethod triangles remains compa- tional process. I n discussions of con-
rable and relative validity can still be struct validity, it has been expressed
evaluated. The interpretation of the in such terms as "from this point of
validity diagonal in an absolute fash- view, a low correlation with athletic
ion requires the fortunate coincidence ability may be just as important and
of both an independence of traits encouraging as a high correlation
and an independence of methods, with reading comprehension" (APA,
represented by zero values in the 1954, p. 17).
heterotrait-heteromethod triangles. The test as a trait-method unit. In
But zero values could also occur any given psychological measuring
through a combination of negative device, there are certain features or
correlation between traits and posi- stimuli introduced specifically to
tive correlation between methods, or represent the trait that it is intended
the reverse. In practice, perhaps all to measure. There are other features
that can be hoped for is evidence for which are characteristic of the
relative validity, that is, for common method being employed, features
variance specific to a trait, above and which could also be present in efforts
beyond shared method variance. to measure other quite different
Discriminant validation. While the traits. The test, or rating scale, or
usual reason for the judgment of in- other device, almost inevitably elicits
validity is low correlations in the systematic variance in response due
validity diagonal (e.g., the Downey to both groups of features. To the ex..
Will-Temperament Test [Symonds, tent that irrelevant method variance
1931, p. 337ff]) tests have also been contributes to the scores obtained,
invalidated because of too high cor- these scores are invalid.
relations with other tests purporting This source of invalidity was first
to measure different things. The noted in the "halo effects" found in
classic case of the social intelligence ratings (Thorndike, 1920). Studies
tests is a case in point. (See below of individual differences among lab-
and also [Strang, 1930; R. Thorndike, oratory animals resulted in the recog-
1936].) Such invalidation occurs nition of "apparatus factors," usu-
when values in the heterotrait-hetero- ally more dominant than psychologi-
cal process factors (Tryon, 1942) . method is of course relative to the
For paper-and-pencil tests, methods test constructor's intent. What is an
variance has been noted under such unwanted response set for one tester
terms as "test-form factors't (Ver- may be a trait for another who wishes
non: 1957, 1958) and "response sets" to measure acquiescence, willingness
(Cronbach: 1946,1950; Lorge, 1937). to take an extreme stand, or tendency
Cronbach has stated the point partic- to attribute socially desirable attri-
ularly clearly: "The assumption is butes to oneself (Cronbach: 1946,
generally made ... that what the 1950; Edwards, 1957; Lorge, 1937).
test measures is determined by the
content of the items. Yet the final MULTITRAIT-MuLTIMETHOD MA-
score ... is a composite of effects re-
sulting from the content of the item M ultitrait-multimethod matrices
and effects resulting from the form are rare in the test and measurement
of the item used" (Cronbach, 1946, literature. Most frequent are two
p.475). "Response sets always lower types of fragment: two methods and
the logical validity of a test.... one trait (single isolated values from
Response sets interfere with infer- the validity diagonal, perhaps ac-
ences from test data" (p. 484). companied by a reliability or two),
While E. L. Thorndike (1920) was and heterotrait-monomethod tri-
willing to allege the presence of halo angles. Either type of fragment is
effects by comparing the high ob- apt to disguise the inadequacy of our
tained correlations with common present measurement efforts, particu-
sense notions of what they ought to larly in failing to call attention to the
be (e.g., it was unreasonable that a preponderant strength of methods
teacher's intelligence and voice qual- variance. The evidence of test valid-
ity should correlate .63) and while ity to be presented here is probably
much of the evidence of response set poorer than most psychologists would
variance is of the same order, the have expected.
clear-cut demonstration of the pres- One of the earliest matrices of this
ence of method variance requires kind was provided by Kelley and
both several traits and several meth- Krey in 1934. Peer judgments by
ods. Otherwise, high correlations be- students provided one method, scores
tween tests might be explained as due on a word-association test the other.
either to basic trait similarity or to Table 2 presents the data for the four
shared method variance. In the most valid traits of the eight he em-
multitrait-multimethod matrix, the ployed. The picture is one of strong
presence of method variance is indi- method factors, particularly among
cated by the difference in level of cor- the peer ratings, and almost total in-
relation between the parallel values validity. For only one of the eight
of the monomethod block and the measures, School Drive, is the value
heteromethod blocks, assuming com- in the validity diagonal (.161) higher
parable reliabili ties among all tests. than all of the heterotrait-hetero-
Thus the contribution of method var- method values. The absence of dis-
iance in Test Al of Table 1 is indi- criminant validity is further indi-
cated by the elevation of r AIBl above cated by the tendency of the values
rAIB2' i.e., the difference between .51 in the monomethod triangles to ap-
and .22, etc. proximate the reliabilities.
The distinction between trait and An early illustration from the ani-

Peer Ratings Association Test

Peer Ratings
Courtesy A1 ( .82)
Honesty B1 .74 ( .80)
Poise C1 .63 .65 (.74)
School Drive 01 .76 .78 .65 (.89)

Association Test
Courtesy A2 .13 .14 .10 .14 (.28)
Honesty B2 .06 .12 .16 .08 .27 (.38)
Poise C2 .01 .08 .10 .02 .19 .37 ( .42)
School Drive D2 .12 .15 .14 .16 .27 .32 .18 (.36)

mal literature comes from Anderson's measure was pre-sex-opportunity, the

(1937) study of drives. Table 3 pre- activity wheel post-opportunity.
sents a sample of his data. Once N ate that the high general level
again, the highest correlations are of heterotrait-heteromethod values
found among different constructs could be due either to correlation of
from the same method, showing the methods variance between the two
dominance of apparatus or method methods, or to correlated trait vari-
factors so typical of the whole field of ance. On a priori grounds, however,
individual differences. The validity the methods would seem about as in-
diagonal for hunger is higher than the dependent as one would be likely to
heteroconstruct-heteromethod val- achieve. The predominance of an ap-
ues. The diagonal value for sex has paratus factor for the activity wheel
not been italicized as a validi ty is evident from the fact that the cor-
coefficient since the obstruction box relation between hunger and thirst

Obstruction Box Activity Wheel

Al B1 C1 A2 B2 C2

Obstruction Box
Hunger At ( .58)
Thirst B1 .54 ( )
Sex C1 .46 .70
Activity Wheel
Hunger A2 .48 .31 .37 (.83)
Thirst B2 .35 .33 .43 .87 ( .92)
Post Sex C2 .31 .37 .44 .69 .78
Note.-Empty parentheses appear in this and subsequent tables where no appropriate reliability estimates are
reported in the original paper.

Memory hension Vocabulary

Ai B1 A2 B~ Aa Bs
Social Intelligence (Memory for Names & Faces) Al ( )
Mental Alertness (Learning Ability) B1 .31 (

Social Intelligence (Sense of Humor) At .30 .31 ( )
Mental Alertness (Comprehension) B2 .29 .38 .48 (
Social Intelligence (Recog. of Mental State) As .23 .35 .31 .35 ( )
Mental Alertness (Vocabulary) B. .30 .58 .40 .48 .47 (

(.87) is of the same magnitude as as three methods of measuring social

their test-retest reliabilities (.83 and intelligence, then their intercorrela-
.92 respectively). tions (.30, .23, and .31) represent
R. L. Thorndike's study (1936) of validities that are not only lower than
the validity of the George Washing- their corresponding monomethod val-
ton Social Intelligence Test is the ues, but also lower than the hetero-
classic instance of invalidation by trait-heteromethod correlations, pro-
high correlation between traits. It in- viding a picture which totally fails to
volved computing all of the intercor- establish social intelligence as a sep-
relations among five subscales of the arate dimension. The Mental Alert-
Social Intelligence Test and five sub- ness validity diagonals (.38, .58, and
scales of the George Washington .48) equal or exceed the monomethod
Mental Alertness Test. The model of values in two out of three cases, and
the present paper would demand that exceed all heterotrait-heteromethod
each of the traits, social intelligence control values. These results illus-
and mental alertness, be measured by trate the general conclusions reached
at least two methods. While this full by Thorndike in his factor analysis of
symmetry was not intended in the the whole 10 X 10 matrix,
study, it can be so interpreted with- The data of Table 4 could be used
out too much distortion. For both to validate specific forms of cognitive
traits, there were subtests employing functioning, as measured by the dif-
acquisition of knowledge during the ferent Hmethods n represented by
testing period (Le., learning or mem- usual intelligence test content on the
ory) , tests involving comprehension one hand and social content on the
of prose passages, and tests that in- other. Table 5 rearranges the 15 val-
volved a definitional activity. Table 4 ues for this purpose. The mono-
shows six of Thorndike's 10 variables method values and the validity diag-
arranged as a multitrait-multimethod onals exchange places, while the
matrix. If the three subtests of the heterotrait-heteromethod control co-
Social Intelligence Test are viewed efficients are the same in both tables.


Social Content Abstract Content

At B1 C1 A2 B2 C2

Social Conten t
Memory (Memory for Names and Faces) Al ( )
Comprehension (Sense of Humor) B1 .30 ( )
Vocabulary (Recognition of Mental State) C1 .23 .31

Abstract Content
Memory (Learning Ability) A2 .31 .31 .35 ( )
Comprehension B2 .29 .48 .35 .38 ( )
Vocabulary C2 .30 .40 .47 .58 .48

As judged against these latter values, titrait-multimethod matrix is pro-

comprehension (.48) and vocabulary vided by Campbell (1953, 1956) for
(.47), but not memory (.31), show rating of the leadership behavior of
some specific validity. This trans- officers by themselves and by their
mutability of the validation matrix subordinates. Only one of 11 var-
argues for the comparisons within the iables (Recognition Behavior) met
heteromethod block as the most gen- the requirement of providing a valid-
erally relevant validation data, and ity diagonal value higher than any of
illustrates the potential interchange- the heterotrait-heteromethod values,
ability of trait and method com- that validity being .29. For none of
ponents. the variables were the validi ties
Some of the correlations in Chi's higher than heterotrait-monomethod
(1937) prodigious study of halo effect values.
in ratings are appropriate to a multi- A study of attitudes toward au-
trait.. multimethod matrix in which thority and nonauthority figures by
each rater might be regarded as rep- Burwen and Campbell (1957) con-
resenting a different method. While tains a complex multitrait-multi-
the published report does not make method matrix, one symmetrical ex-
these available in detail because it cerpt from which is shown in Table 6.
employs averaged values, it is appar- Method variance was strong for most
ent from a comparison of his Tables of the procedures in this study.
IV and VIII that the ratings gen- Where validity was found, it was
erally failed to meet the requirement primarily at the level of validity
that ratings of the same trait by dif.. diagonal values higher than hetero-
ferent raters should correlate higher trait-heteromethod values. As il-
than ratings of different traits by the lustrated in Table 6, attitude toward
same rater. Validity is shown to the father showed this kind of validity, as
extent that of the correlations in the did attitude toward peers to a lesser
heteromethod block, those in the degree. Attitude toward boss showed
validity diagonal are higher than the no validity. There was no evidence
average heteromethod-heterotrait of a generalized attitude toward
values. authority which would include father
A conspicuously unsuccessful mul- and boss, although such values as the

Interview Trait Check..List

Al B1 C1 A2 B2 C2
Father Al ( )
Boss B2 .64 ( )
Peer Ct .65 .76 (
Father A2 .40 .08 .09 (.24)
Boss B2 .19 -.10 -.03 .23 (.34)
Peer C2 .27 .11 .23 .21 .45 (.55)

.64 correlation between father and lustrating the assessment of two

boss as measured by interview might traits by four different methods. For
have seemed to confirm the hypothe- all measures but one, the highest cor-
sis had they been encountered in iso- relation is the apparatus one, i.e.,
lation. with the other trait measured by the
Borgatta (1954) has provided a same method rather than with the
complex multimethod study from same trait measured by a different
which can be extracted Table 7, il- method. Neither of the traits ~nds
(N == 125)

Sociometric Observation

by Others by Self Group In.. Role

teraction PIaying

As Ba
Sociometric by Others
Popularity At ( )
Expansiveness Hi .47 (
Sociometric by Self
Popularity A2 .19 .18 ( )
Expansiveness B2 .07 .08 .32

Observation of Group Interaction

Popularity Aa .25 .18 .26 .11 ( )
Expansiveness Ba .21 .12 .28 .15 .84
Observation of Role Playing
Popularity A4 .24 .14 .18 .01 .66 .58 ( )
Expansiveness B4 .25 .12 .26 .05 .66 .76 .73 (

any consistent validation by the re- tional situations, and the apparent
quirement that the validity diagonals sharing of method variance between
exceed the heterotrait-heteromethod them, is correspondingly high.
control values. As a most minimal In another paper by Borgatta
requirement, it might be asked if the (1955) t 12 interaction process vari-
sum of the two values in the validity ables were measured by quantitative
diagonal exceeds the sum of the two observation under two conditions,
control values, providing a compari- and by a projective test. In this test,
son in which differences in reliability the stimuli were pictures of groups,
or communality are roughly par.. for which the S generated a series of
tialled out. This condition is achieved verbal interchanges; these were then
at the purely chance level of three scored in Interaction Process Analy-
times in the six tetrads. This matrix sis categories. For illustrati'Ve pur-
provides an interesting range of poses, Table 8 presents the five traits
methodological independence. The which had the highest mean com-
two uSociometric by Others" meas- munalities in the over-all factor anal-
ures, while representing the judg- ysis. Between the two highly sim-
ments of the same set of fellow par- ilar observational methods, valida-
ticipants, come from distinct tasks: tion is excellent: trait variance runs
Popularity is based upon each par- higher than method variance; valid-
ticipant's expressi0n of his own ity diagonals are in general higher
friendship preferences, while Ex- than heterotrait values of both the
pansiveness is based upon each par- heteromethod and monomethods
ticipant's guesses as to the other par- blocks, most unexceptionably so for
ticipant's choicest from which has Gives Opinion and Gives Orientation.
been computed each participant's The pattern of correlation among the
reputation for liking lots of other per- traits is also in general confirmed.
sons, i.e., being "expansive. tf In line Of greater interest because of the
with this considerable independence, greater independence of methods are
the evidence for a method factor is the blocks involving the projective
relatively low in comparison with the test. Here the validity picture is
observational procedures. Similarly, much poorer. Gives Orientation
the two uSociometric by Self' meas- comes off best, its projective test
ures represent quite 'separate tasks, validity values of .35 and .33 being
Popularity coming from his estimates bested by only three monomethod
of the choices he will receive from values and by no heterotrait-hetero-
others, Expansiveness from the num- method values within the projective
ber of expressions of attraction to blocks. All of the other validities are
others which he makes on the socio- exceeded by some heterotrait-hetero-
metric task. In contrast, the meas- method value.
ures of Popularity and Expansiveness The projective test specialist may
from the observations of group inter- object to the implicit expectations of
action and the role playing not only a one-to-one correspondence between
involve the same specific observers, projected action and overt action.
but in addition the observers rated Such expectations should not be at-
the pair of variables as a part of the tributed to Borgatta, and are not
same rating task in each situation. necessary to the method here pro..
The apparent degree of method vari- posed. For the simple symmetrical
ance within each of the two observa- model of this paper, it has been as-


Free Behavior Role Playing Projective Test

Al B1 C1 D1 E1 A2 B2 C2 D2 E2 Po:. B. C. Ds E. ~

Free Behavior ~
Shows solidarity Al ( )
Gives suggestion Bt .25 ( ) ~
Gives opinion C1 .13 .24 ( )
Gives orientation D 1 -.14 .26 .52 ( )
Shows disagreement E1 .34 .41 .27 .02 ( ~

Role Playing 2
Shows solidarity A2 .43 .43 .08 .10 .29 ( ) ~
Gives suggestion B2 .16 .32 .00 .24 .07 .37 ( ) ~
Gives opinion C2 .15 .27 .60 .38 .12 .01 .10 ( )
Gives orientation D2 - .12 .24 .44 .74 .08 .04 .18 .40 ( )
Shows disagreement & .51 .36 .14 -.12 .50 .39 .27 .23 -.11 ~

Projective Test
Shows solidarity As .20 .17 .16 .12 .08 .17 .12 .30 .17 .22 ( ) ~
Gives suggestion B. .05 .21 .05 .08 .13 .10 .19 -.02 .06 .30 .32 ( )
Gives opinion C. .31 .30 .13 - .02 .26 .25 .19 .15 -.04 .53 .31 .63 ( )
Gives orientation Ds -.01 .09 .30 .35 -.05 .03 .00 .19 .33 .00 .37 .29 .32 ( ) ~
Shows disagreement Es .13 .18 .10 .14 .19 .22 .28 .02 .04 .23 .27 .51 .47 .30 )

(N= 166)

Peer Ratings Objective

Al B1
Peer Rating
Intelligence Al (.85)
Effort B1 .66 (.84)
Objective Measures
Intelligence A2 .46 .29 ( )
Effort B2 .46 .40 .10

sumed that the measures are labeled (.84 and .85). The objective meas-
in correspondence with the correla- ures share no appreciable apparatus
tions expected, i.e., in correspondence overlap because they were independ-
with the traits that the tests are ent operations. In spite of Mayo's
alleged to diagnose. Note that in argument that the ratings have some
Table 8, Gives Opinion is the best valid trait variance, the .46 hetero-
projective test predictor of both free trait-heteromethod value seriouslyde-
behavior and role playing Shows Dis- preciates the otherwise impressive .46
agreement. Were a proper theoretical and .40 validity values.
rationale available, these values Cronbach (1949, p. 277) and Ver-
might be regarded as validities. non (1957, 1958) have both discussed
Mayo (1956) has n1ade an analysis the multitrait-multi method matrix
of test scores and ratings of effort and shown in Table 10, based upon data
intelligence, to estimate the contribu- originally presented by H. S. Conrad.
tion of halo (a kind of methods vari- Using an approximative technique,
ance) to ratings. As Table 9 shows, Vernon estimates that 61 % of the
the validity picture is ambiguous. systematic variance is due to a gen-
The method factor or halo effect for eral factor, that 21!% is due to the
ratings is considerable although the test-form factors specific to verbal or
correlation between the two ratings to pictorial forms of items, and that
(.66) is well below their reliabili ties but 111% is due to the content fac-

Verbal Items Pictorial Iterns

At B1
Verbal Iterns
Mechanical Facts Al (.89)
Electrical Facts B1 .63 (.71)
Pictorial Iterns
Mechanical Facts A2 .61 .45 (.82)
Electrical Facts Ba .49 .51 .64 (.67)
tors specific to electrical or to mechan- variance, and thus as having an in-
ical contents. Note that for the pur- flated validity diagonal. The more
poses of estimating validity, the in- independent heteromethod blocks in-
terpretation of the general factor, volving Peer Ratings show some evi-
which he estimates from the .49 and dence of discriminant and convergent
.45 heterotrait-heteromethod values, validity, with validity diagonals av-
is equivocal. I t could represent de- eraging .33 (Inventory X Peer Rat-
sired competence variance, represent- ings) and .39 (Self RatingsXPeer
ing components common to both elec- Ratings) against heterotrait-hetero-
trical and mechanical skills-perhaps method control values averaging .14
resulting from general industrial shop and .16. While not intrinsically im-
experience, common ability compo- pressive, this picture is nonetheless
nents, overlapping learning situations, better than most of the validity ma-
and the like. On the other hand, this trices here assembled. Note that the
general factor could represent over- Self Ratings show slightly higher
lapping method factors, and be due to validity diagonal elevations than do
the presence in both tests of multiple the Inventory scores, in spite of the
choice item format, IBM answer much greater length and undoubtedly
sheets, or the heterogeneity of the Ss higher reliability of the latter. In ad-
in conscientiousness, test-taking mo- dition, a method factor seems almost
tivation, and test-taking sophistica- totally lacking for the Self Ratings,
tion. Until methods that are still while strongly present for the Inven-
more different and traits that are tory, so that the Self Ratings come
still more independent are introduced off much the best if true trait vari-
into the validation matrix, this gen- ance is expressed as a proportion of
eral factor remains uninterpretable. total reliable variance (as Vernon
From this standpoint it can be seen [1958] suggests). The method factor
that 21!% is a very minimal estimate in the STDCR Inventory is undoubt-
of the total test-form variance in the edly enhanced by scoring the same
tests, as it represents only test-form item in several scales, thus contribut-
components specific to the verbal or ing correlated error variance, which
the pictorial items, Le., test-form could be reduced without loss of reli-
components which the two forms do ability by the simple expedient of
not share. Similarly, and more hope- adding more equivalent items and
fully, the 11!% content variance is a scoring each item in only one scale.
very minimal estimate of the total I t should be noted that Carroll makes
true trai t variance of the tests, repre- explicit use of the comparison of the
senting only the true trait variance validity diagonal with the hetero-
which electrical and mechanical trait.. heteromethod values as a valid-
knowledge do not share. i ty indicator.
Carroll (1952) has provided data
on the Guilford-Martin Inventory of
Factors STDCR and related ratings
which can be rearranged into the The illustrations of multitrait-
matrix of Table 11. (Variable R has multimethod matrices presented so
been inverted to reduce the number far give a rather sorry picture of the
of negative correlations.) Two of the validity of the measures of individual
methods, Self Ratings and Inventory differences involved. The typical
scores, can be seen as sharing method case shows an excessive amount of

Inventory Self Ratings Peer Ratings

S T D C -R S T D C -R 5 T D C -R
Inventory ~
S (.92)
T .27 (.89) ~tl:l
D .62 .57 (.91)
C .36 .47 .90 (.91) ~
-R .69 .32 .28 -.06 (.89)
Self Ratings
S .57 .11 .19 -.01 .53 ( )
T .28 .65 .42 .26 .37 .26 ( ) l='
D .44 .25 .53 .45 .29 .31 .32 ( )
C .31 .20 .54 .52 .13 .11 .21 .47 ( )
-R .15 .30 .12 .04 .34 .10 .12 .04 .06 ~
Peer Ratings
S .37 .08 .10 -.01 .38 .42 .02 .08 .08 .31 (.81)
T .23 .32 .15 .04 .40 .20 .39 .40 .21 .31 .37 (.66)
D .31 .11 .27 .24 .25 .17 .09 .29 .27 .30 .49 .38 (.73)
C .08 .15 .20 .26 -.05 .01 .06 .14 .30 .07 .19 .16 .40 (.75)
R .21 .20 -.03 -.16 .45 .28 .17 .08 .01 .56 .55 .56 .34 -.07 (.76)
method variance, which usually ex- ings, which were agreed upon by
ceeds the amount of trait variance. three staff members after discussion
This picture is certainly not as a re- and review of the enormous amount
suIt of a deliberate effort to select of data and the many other ratings on
shockingly bad examples: these are each S. Unfortunately for our pur..
ones we have encountered without at- poses, the staff members saw the rat-
tempting an exhaustive coverage of ings by Self and Teammates before
the literature. The several unpub- making theirs, although presumably
Hsherl studies of which we are aware they were little influenced by these
show the same picture. If they seem data because they had so much other
more disappointing than the general evidence available to them. (See Kel-
run of validity data reported in the Iy & Fiske, 1951, especially p. 64.)
journals, this impression may very The Self and Teammate ratings rep-
well be because the portrait of valid.. resent entirely separate Umethods"
ity provided by isolated values and can be given the major emphasis
plucked from the validity diagonal is in evaluating the data to be pre-
deceptive, and uninterpretable in sented.
isolation from the total matrix. Yet I n a previous analysis of these data
it is clear that few of the classic ex- (Fiske, 1949), each of the three heter-
amples of successful measurement of otrait-monomethod triangles was
individual differences are involved, computed and factored. To provide
and that in many of the instances, the a multitrait-multimethod matrix, the
quality of the data might have been 1452 heteromethod correlations have
such as to magnify apparatus factors, been computed especially for this re-
etc. A more nearly ideal set of per- port. 2 The full 66 X 66 matrix with
sanality data upon which to illus- its 2145 coefficients is obviously too
trate the method was therefore large for presentation here, but will
sought in the multiple application of be used in analyses that follow. To
a set of rating scales in the assess- provide an illustrative sample, Table
ment study of clinical psychologists 12 presents the interrelationships
(Kelly & Fiske, 1951). among five variables, selecting the
In that study, "Rating Scale A" one best representing each of the five
contained 22 traits referring to 44be_ recurrent factors discovered in Fiske's
havior which can be directly observed (1949) previous analysis of the mono-
on the surface." In using this scale method matrices. (These were chosen
the raters were instructed to "disre- without regard to their validity as
gard any inferences about underlying indicated in the heteromethod blocks.
dynamics or causes" (p. 207). The Assertive-No. 3 reflected-was se-
SSt first-year clinical psychology stu- lected to represent Recurrent Factor
dents, rated themselves and also their 5 because Talkative had also a high
three teammates with whom they
2 We are indebted to E. Lowell Kelly for
had participated in the various as- furnishing the V.A. assessment date to us, and
sessment procedures and with whom to Hugh Lane for producing the matrix of
they had lived for six days. The intercorrelaHons.
median of the three teammates' rat- In the original report the correlations were
ings was used for the Teammate based upon 128 men. The present analyses
were based on only 124 of these cases because
score. The Ss were also rated on these of clerical errors. This reduction in N leads
22 traits by the assessment staff. Our to some very minor discrepancies between
analysis uses the Final Pooled rat- these values and those previously reported


Staff Ratings Teammate Ratings Self Ratings

Al Bl Cl Dl El A. B. C. D. E. As B. C3 D3 Es ~
Staff Ratings
Assertive Al (.89) &:
.37 (.85)
-.24 -.14 (.81) ~t:l:l
Unshakable Poise Dl .25 .46 .08 (.84) ttl
Broad Interests El .35 .19 .09 .31 (.92) l:-l
Teammate Ratings
Assertive A. .71 .35 - .18 .26 .41 (.82) ~
Cheerful B. .39 .53 - .15 .38 .29 .37 (.76) ~
Serious C. -.27 - .31 .43 - .06 .03 - .15 -.19 (.70)
Unshakable Poise D. .03 -.05 .03 .20 .07 .11 .23 .19 (.74) ~
Broad Interests E. .19 .05 .04 .29 .47 .33 .22 .19 .29 (.76)
Self Ratings
Assertive As .48 .31 - .22 .19 .12 .46 .36 - .15 .12 .23 ( ) ~
Cheerful Bs .17 .42 -.10 .10 -.03 .09 .24 -.25 -.11 -.03 .23 ( )
Serious C s -.04 -.13 .22 - .13 -.05 -.04 -.11 .31 .06 .06 -.05 - .12 ( )
Unshakable Poise Ds .13 .27 -.03 .22 -.04 .10 .15 .00 .14 -.03 .16 .26 .11 ( )
Broad Interests Ea .37 .15 -.22 .09 .26 .27 .12 -.07 .05 .35 .21 .15 .17 .31 (
loading on the first recurrent factor.) be stated in terms of method factors
The picture presented in Table 12 or shared confounded irrelevancies,
is, we believe, typical of the best operate strongly in these data, as
validity in personality trait ratings probably in all data involving rat-
that psychology has to offer at the ings. In such cases, where several
present time. It is comforting to note variables represent each factor, none
that the picture is better than most of the variables consistently meets
of those previously examined. Note the criterion that validity values ex-
that the validities for Assertive ex.. ceed the corresponding values in the
ceed heterotrait values of both the monomethod triangles, when the full
monomethod and heteromethod tri- matrix is examined.
angles. Cheerful, Broad Interests, To summarize the validation pic-
and Serious have validities exceeding ture with respect to comparisons of
the heterotrait-heteromethod values validity values with other hetero-
with two exceptions. Only for Un- method values in each blocI{, Table
shakable Poise does the evidence of 13 has been prepared. For each trait
validity seem trivial. The elevation and for each of the three hetero-
of the reliabilities above the hetero- method blocks, it presents the value
trait-monomethod triangles is further of the validity diagonal, the highest
evidence for discriminant validity. heterotrait value involving that trait,
A comparison of Table 12 with the and the number out of the 42 such
full matrix shows that the procedure heterotrait values which exceed the
of having but one variable to repre- validity diagonal in magnitude. (The
sent each factor has enhanced the ap- number 42 comes from the grouping
pearance of validity, although not of the 21 other column values and the
necessarily in a misleading fashion. 21 other row values for the column
Where several variables are all highly and row intersecting at the given
loaded on the same factor, their diagonal value.)
Utrue" level of intercorrelation is On the requirement that the valid-
high. Under these conditions, sam- ity diagonal exceed all others in its
pling errors can depress validity diag- heteromethod block, none of the
onal values and enhance others to traits has a completely perfect record,
produce occasional exceptions to the although some come close. Assertive
validity picture, both in the hetero- has only one trivial exception in the
trait-monomethod matrix and in the Teammate..Self block. Talkative has
heteromethod-heterotrait triangles. almost as good a record, as does
In this instance, with an N of 124, the Imaginative. Serious has but two in-
sampling error is appreciable, and consequential exceptions and Interest
may thus be expected to exaggerate in Women three. These traits stand
the degree of invalidity. out as highly valid in both self-
Within the monomethod sections, description and reputation. Note
errors of measurement will be cor- that the actual validity ~oefficients of
related, raising the general level of these four traits range from but .22 to
values found, while within the heter- .82, or, if we concentrate on the
omethods block, measurement errors Teammate-Self block as most cer-
are independent, and tend to lower tainly representing independent
the values both along the validity methods, from but .31 to .46. While
diagonal and in the heterotrait tri- these are the best traits, it seems that
angles. These effects, which may also most of the traits have far above


Staff- Teammate Staff..Self Teammate-Self

High.. High.. High-

No. Val. est No. Val. est No.
Val. est
Higher Het.
Higher Het. Higher

1. Obstructiveness* .30 .34 2 .16 .27 9 .19 .24 1

2. Unpredictable .34 .26 0 .18 .24 3 .05 .19 29
3. Assertive* .71 .65 0 .48 .45 0 .46 .48 1
4. Cheerful* .53 .60 2 .42 .40 0 .24 .38 5
5. Serious .43 .35 0 .22 .27 2 .31 .24 0
6. Cool, Aloof .49 .48 0 .20 .46 10 .02 .34 36
7. Unshakable Poise .20 .40 16 .22 .27 4 .14 .19 10
8. Broad Interests* .47 .46 0 .26 .37 6 .35 .32 0
9. Trustful .26 .34 5 .08 .25 19 .11 .17 9
10. Self-cen tered .30 .34 2 .17 .27 6 - .07 .19 36
11. Talkative* .82 .65 0 .47 .45 0 .43 .48 1
12. Adventurous .45 .60 6 .28 .30 2 .16 .36 14
13. Socially Awkward .45 .37 0 .06 .21 28 .04 .16 30
14.. Adaptable* .44 .40 0 .18 .23 10 .17 .29 8
15. Self-sufficient .32 .33 1 .13 .18 5 .18 .15 0
16. Worrying, Anxious* .41 .37 0 .23 .33 5 .15 .16 1
17. Conscientious .26 .33 4 .11 .32 19 .21 .23 2
18. Imaginative* .43 .46 1 .32 .31 0 .36 .32 0
19. Interest in Women* .42 .43 2 .55 .38 0 .37 .40 1
20. Secretive, Reserved* .40 .58 5 .38 .40 2 .32 .35 3
21. Independent Minded .39 .42 2 .08 .25 19 .21 .30 3
22. Emotional Expression * .62 .63 1 .31 .46 5 .19 .34 10

Note.-Val. -value in validity diagonal; Highest Het. =highest heterotrait value; No. Higher =number of
heterotrait values exceeding the validity diagonal.
tit Trait names which have validities in all three heteromethod blocks significantly greater than the heterotrait
heteromethod values at the .001 level.

chance validity. All those having 10 Self block, all but five for the most
or fewer exceptions have a degree of independent block, Teammate-Self.
validity significant at the .001 level The exceptions to significant validity
as crudely estimated by a one-tailed are not parallel from column to col-
sign test. s All but one of the variables umn, however, and only 13 of 22
meet this level for the Staff..Team- variables have .001 significant valid-
mate block, all but four for the Staff.. ity in/jaIl threelblocks. These are indi..
cated':by an:~asterisk in Table 13.
8 If we take the validity value as fixed (ig.. This highly significant general
Doring its sampling fluctuations) J then we can level of validity must not obscure the
determine whether the number of values meaningful problem created by the
larger than it in its row and column is less than
expected on the null hypothesis that half the occasional exceptions, even for the
values would be above it. This procedure re- best variables. The excellent traits
quires the assumption that the position (above of Assertive and Talkative provide
or below the validity value) of anyone of a case in point. In terms'~ofAFiske's
these comparison values is independent of the
position of each of the others J a dubious as-
original analysis, both have high
sumption when common methods and trait loadings on the recurrent factor
variance are present. "Confident self-expression" (repre..
sented by Assertive in Table 12). initially predisposed to reinterpret
Talkative also had high loadings on self-ratings, to treat them as symp-
the recurrent factor of Social Adapta- toms rather than to interpret them
bility (represented by Cheerful in literally. Thus, we were alert to in-
Table 12). We would expect, there- stances in which the self ratings were
fore, both high correlation between not literally interpretable, yet none-
them and significant discrimination theless had a diagnostic significance
as well. And even at the common when properly "translated. H By and
sense level, most psychologists would large, the instances of invalidity of
expect fellow psychologists to dis- self-descriptions found in this assess..
criminate validly between assertive- ment study are not of this type. but
ness (nonsubmissiveness) and talka- rather are to be explained in terms of
tiveness. Yet in the Teammate-Self an absence of communality for one
block, Assertive rated by self cor- of the variables involved. In general,
relates . 48 with Talkative by team- where these self descriptions are in-
mates, higher than either of their terpretable at all, they are as literally
validities in this block, .43 and .46. interpretable as are teammate de-
In terms of the average values of scriptions. Such a finding may, of
the validities and the frequency of course, reflect a substantial degree of
exceptions, there is a distinct trend insight on the part of these Ss.
for the Staff-Teammate block to The general success in discriminant
show the greatest agreement. This validation coupled with the parallel
can be attributed to several factors. factor patterns found in Fiske's
Both represent ratings from the ex- earlier analysis of the three intra-
ternal point of view. Both are aver- method matrices seemed to justify an
aged over three judges, minimizing inspection of the factor pattern valid-
individual biases and undoubtedly in- ity in this instance. One possible pro..
creasing reliabilities. Moreover, the cedure would be to do a single analy-
Teammate ratings were available to sis of the whole 66X66 matrix.
the S~aff in making their ratings. An.. Other approaches focused upon sep-
other effect contributing to the less arate factoring of heteromethods
adequate convergence and discrim- blocks, matrix by matrix, could also
ination of Self ratings was a response be suggested. Not only would such
set toward the favorable pole which methods be extremely tedious, but in
greatly reduced the range of these addition they would leave undeter-
measures (Fiske, 1949, p. 342). In- mined the precise comparison of
spection of the details of the instances factor-pattern similarity. Correlat-
of invalidity summarized in Table 13 ing factor loadings over the popula-
shows that in most instances the ef- tion of variables was employed for
fect is attributable to the high spec- this purpose by Fiske (1949) but
ificity and low communality for the while this provided for the identifica-
self-rating trait. In these instances, tion of recurrent factors, no single
the column and row intersecting at over-all index of factor pattern sim-
the low validity diagonal are asym- ilarity was generated. Since our im-
metrical as far as general level of cor- mediate interest was in confirming a
relation is concerned, a fact covered pattern of interrelationships, rather
over by the condensation provided than in describing it, an efficient
in Table 13. short cut was available: namely to
The personality psychologist is test the similari ty of the sets of heter-

otrait values by correlation coeffi- of construct validity (Cronbach &

cients in which each entry repre- Meehl, 1955; APA, 1954), this pa-
sented the size values of the given per is primarily concerned with the
heterotrait coefficients in two differ- adequacy of tests as measures of a
ent matrices. For the full matrix, construct rather than with the ade-
such correlations would be based quacy of a construct as determined
upon the N of the 22 X21/2 or 231 by the confirmation of theoretically
specific heterotrait combinations. predicted associations with measures
Correlations were computed between of other constructs. We believe that
the Teammate and Self monometh- before one can test the relationships
ods matrices, selected as maximally between a specific trait and other
independent. (The values to follow traits, one must have some confidence
were computed from the original cor- in one's measures of that trait. Such
relation matrix and are somewhat confidence can be supported by evi-
higher than that which would be ob- dence of convergent and discriminant
tained from a reflected matrix.) The validation. Stated in different words,
similarity between the two mono- any conceptual formulation of trait
methods matrices was .84, corrob- will usually include implicitly the
orating the factor-pattern similarity proposition that this trait is a re-
between these matrices described sponse tendency which can be ob-
more fully by Fiske in his parallel served under more than one experi-
factor analyses of them. To carry mental condition and that this trait
this mode of analysis into the hetero- can be meaningfully differentiated
method block, this block was treated from other traits. The testing of
as though divided into two by the these two propositions must be prior
validity diagonal, the above diagonal to the testing of other propositions to
values and the below diagonal repre- prevent the acceptance of erroneous
senting the maximally independent conclusions. For example, a con-
validation of the heterotrait correla- ceptual framework might postulate a
tion pattern. These two correlated large correlation between Tr~its A
.63, a value which, while lower, shows and B and no correlation between
an impressive degree of confirmation. Traits A and C. If the experimenter
There remains the question as to then measures A and B by one
whether this pattern upon which the method (e.g., questionnaire) and C
two heteromethod-heterotrait tri- by another method (such as the meas-
angles agree is the same one found in urement of overt behavior in a situa-
common between the two mono- tion test)) his findings may be con-
method triangles. The intra-Team- sistent with his hypotheses solely as
mate matrix correlated with the two a function of method variance com-
heteromethod triangles .71 and .71. mon to his measures of A and B but
The intra-Self matrix correlated with not to C.
the two .57 and .63. In general, then, The requirements of this paper are
there is evidence for validity of the intended to be as appropriate to the
intertrait relationship pattern. relatively atheoretical efforts typical
of the tests and measurements field
as to more theoretical efforts. This
Relation to construct validity. While emphasis on validational criteria ap-
the validational criteria presented are propriate to our present atheoretical
explicit or implicit in the discussions level of test construction is not at all
incompatible with a recognition of multimethod matrix is, we believe,
the desirability of increasing the ex- an important practical first step in
tent to which all aspects of a test and avoiding 4'the danger. a that the

the testing situation are determined investigator will fall into the trap of
by explicit theoretical considerations, thinking that because he went from
as Jessor and Hammond have advo- an artistic or literary conception
cated (Jessor & Hammond, 1957). . .. to the construction of items for a
Relation to operationalism. Under- scale to measure it, he has validated
wood (1957, p. 54) in his effective his artistic conception" (Underwood,
presentation of the operationalist 1957, p. 55). In contrast with the
point of view shows a realistic aware- single operationalism now dominant
ness of the amorphous type of theory in psychology, we are advocating a
with which most psychologists work. multiple operationalism, a convergent
He contrasts a psychologist's 'lit- operationalism (Garner, 1954; Garner,
erary" conception with the latter's Hake, & Eriksen, 1956),a methodologi-
operational definition as represented cal triangulation (Campbell: 1953,
by his test or other measuring instru- 1956), an operational delineation
ment. He recognizes the importance (Campbell, 1954), a convergent valida-
of the literary definition in communi- tion.
cating and generating science. He Underwood's presentation and that
cautions that the operational defini- of this paper as a whole imply moving
tion "may not at all measure the from concept to operation, a sequence
process he wishes to measure; it may that is frequent in science, and per-
measure something quite different" haps typical. The same point can be
(1957, p. 55). He does not, however, made, however, in inspecting a tran-
indicate how one would know when sition from operation to construct.
one was thus mistaken. For any body of data taken from a
The requirements of the present single operation, there is a subinfinity
paper may be seen as an extension of of interpretations possible; a sub-
the kind of operationalism Under- infinity of concepts, or combinations
wood has expressed. The test con- of concepts, that it could represent.
structor is asked to generate from his Any single operation, as representa-
literary conception or private con- tive of concepts, is equivocal. In an
struct not one operational embodi- analogous fashion, when we view the
ment, but two or more, each as dif- Ames distorted room from a fixed
feren t in research vehicle as possible. point and through a single eye, the
Furthermore, he is asked to make ex- data of the retinal pattern are equiv-
plicit the distinction between his new ocal, in that a subinfinity of hexa-
variable and other variables, distinc- hedrons could generate the same pat-
tions which are almost certainly im- tern. The addition of a second view-
plied in his literary definition. In his point, as through binocular parallax,
very first validational efforts, before greatly reduces this equivocality,
he ever rushes into print, he is asked greatly limits the constructs that
to apply the several methods and sev- could jointly account for both sets of
eral traits jointly. His literary defini- data. In Garner's (1954) study, the
tion, his conception, is now best rep- fractionation measures from a single
resented in what his independent method were equivocal-they could
measures of the trait hold distinc- have' been a function of the stimulus
tively in common. The multitrait- distance being fractionated, or they

could have been a function of the trait variance, and in the rearrange-
comparison stimuli used in the judg- ment of the social intelligence ma-
ment process. A multiple, convergent trices of Tables 4 and 5.) It will then
operationalism reduced this equivo- be recognized that measurement pro-
cality, showing the latter conceptual- cedures usually involve several the-
ization to be the appropriate one, and oretical constructs in joint applica-
revealing a preponderance of meth- tion. Using obtained measurements
ods variance. Similarly for learning to estimate values for a single con-
studies: in ideotifying constructs struct under this condition still re-
with the response data from animals quires comparison of complex meas-
in a specific operational setup there is ures varying in their trait composi-
equivocality which can operationally tion, in something like' a m ultitrai t-
be reduced by introducing transposi- multimethod matrix. Mill's joint
tion tests, different operations so de- method of similarities and differences
signed as to put to comparison the still epitomizes much about the ef-
rival conceptualizations (Campbell, fective experimental clarification of
1954). concepts.
Garner's convergent operational- The evaluation of a multitrait-multi-
ism and our insistence on more than method matrix. The evaluation of the
one method for measuring each con- correlation matrix formed by inter-
cept depart from Bridgman's early correlating several trait-method units
position that "if we have more than must take into consideration the
one set of operations, we have more many factors which are known to
than one concept, and strictly there affect the magnitude of correlations.
should be a separate name to cor- A value in the validity diagonal must
respond to each different set of op- be assessed in the light of the reliabil-
erations" (Bridgman, 1927, p. 10). ities of the two measures involved:
At the current stage of psychological e.g., a low reliability for Test A 2
progress, the crucial requirement is might exaggerate the apparent
the demonstration of some conver- method variance in Test At. Again,
gence, not complete congruence, be- the whole approach assumes ade-
tween two distinct sets of operations. quate sampling of individuals: the
With only one method, one has no curtailment of the sample with re-
way of distinguishing trait variance spect to one or more traits will de-
from unwanted method variance. press the reliability coefficients and
When psychological measurement intercorrelations involving these
and conceptualization become better trai ts. While restrictions of range
developed, it may well be appropri- over all traits produces serious diffi-
ate to differentiate conceptually be- culties in the interpretation of a mul-
tween Trait-Method Unit A l and titrait-multimethod matrix and
Trait-Method Unit A 2 , in which should be avoided whenever possible,
Trait A is measured by different the presence of different degrees of
methods. More likely, what we have restriction on different traits is the
called method variance will be speci- more serious hazard to meaningful
fied theoretically in terms of a set of interpretation.
constructs. (This has in effect been Various statistical treatments
illustrated in the discussion above in for multitrait-multimethod matrices
which it was noted that the response might be developed. We have con-
set variance might be viewed as sidered rough tests for the elevation
of a value< in the validity diagonal the trait as conceptualized. Although
above the comparison values in its this view will reduce the range of
row and column. Correlations be- suitable methods, it will rarely re..
tween the columns for variables strict the measurement to one opera-
measuring the same trait, variance tional procedure.
analyses, and factor analyses have Wherever possible, the several
been proposed to us. However, the methods in one matrix should be com-
development of such statistical meth- pletely independent of each other:
ods is beyond the scope of this paper. there should-- be no prior reason for
We believe that such summary sta- believing that they share method
tistics are neither necessary nor ap- variance. This requirement is neces-
propriate at this time. Psychologists sary to permit the values in the heter-
today should be concerned not with omethod-heterotrait triangles to ap-
evaluating tests as if the tests were proach zero. If the nature of the
fixed and definitive, but rather with traits rules out such independence
developing better tests. We believe of methods, efforts should be made to
that a careful examination of a multi- obtain as much diversity as possible
trait-multimethod matrix will indi- in terms of data-sources and classifi-
cate to the experimenter what his cation processes. Thus, the classes of
next steps should be: it will indicate stimuli or the background situations,
which methods should be discarded the experimental contexts, should be
or replaced, which concepts need different. Again, the persons provid-
sharper delineation, and which con- ing the observations should have dif-
cepts are poorly measured because of ferent roles or the procedures for
excessive or confounding method var- scoring should be varied.
iance. Validity judgments based on Plans for a validational matrix
such a matrix must take into account should take into account the differ..
the stage of development of the con- ence between the interpretations re..
structs, the postulated relationships garding convergence and discrimina-
among them, the level of technical tion. I t is sufficient to demonstrate
refinement of the methods, the rela- convergence between two clearly dis..
tive independence of the methods, tinct methods which show little over-
and any pertinent characteristics of lap in the heterotrait-heteromethod
the sample of SSt We are proposing triangles. While agreement between
that the validational process be several methods is desirable, conver-
viewed as an aspect of an ongoing gence between two is a satisfactory
program for improving measuring minimal requirement. Discrimina-
procedures and that the llvalidity tive validation is not so easily
coefficients" obtained at anyone achieved. Just as it is impossible to
stage in the process be interpreted in prove the null hypothesis, or that
terms of gains over preceding stages some object does not exist, so one
and as indicators of where further ef- can never establish that a trait, as
fort is needed. measured, is differentiated from all
The design of a multitrait-multi- other traits. One can only show that
method matrix. The several methods this measure of Trait A has little
and traits included in a validational overlap with those measures of Band
matrix should be selected with care. C, and no dependable generalization
The several methods used to measure beyond Band C can be made. For
each trait should be appropriate to example, social poise could probably

be readily discriminated from aes- the nontrai t a ttributes of each test.

thetic interests, but it should also be The failure to demonstrate conver-
differentiated from leadership. gence may lead to conceptual devel-
Insofar as the traits are related and opments rather than to the abandon-
are expected to correlate with each ment of a test.
other, the monomethod correlations
will be substantial and heteromethod
correlations between traits will also This paper advocates a valida-
be positive. For ease of interpreta- tional process utilizing a rnatrix of
tion, it may be best to include in the intercorrelations among tests repre-
matrix at least two traits, and prefer- senting at least two traits, each meas-
ably two sets of traits, which are ured by at least two methods. Meas-
postulated to be independent of each ures of the same trait should correlate
other. higher with each other than they do
In closing, a word of caution is with measures of different traits in-
needed. Many multitrait-multi- volving separate methods. Ideally,
method matrices will show no con- these validity values should also be
vergent validation: no relationship higher than the correlations among
may be found between two methods different traits measured by the same
of measuring a trait. I n this common method.
situation, the experimenter should I] 1uRtrations from the literature
examine the evidence in favor of sev- show that these desirable conditions,
eral alternative propositions: (a) as a set, are rarely met. Method or
Neither method is adequate for meas- apparatus factors make very large
uring the trait; (b) One of the two contributions to psychological meas-
methods does not really measure the urements.
trait. (When the evidence indicates The notions of convergence be-
that a method does not measure the tween independent measures of the
postulated trait, it may prove to same trait and discrimination be-
measure some other trait. High cor- tween measures of different traits are
relations in the heterotrait-hetero- compared \vith previously published
method triangles may provide hints formulations, such as construct valid-
to such possibilities.) (c) The trait is ity and convergent operationalism,
not a functional unity, the response Problems in the application of this
tendencies involved being specific to validational process are considered.

AMERICAN PSYCHOLOGICAL ASSOCIATION. tion: Actual, role-playing, and projective.
Technical recommendations for psychologi- J. abnorm. soc. Psychol., 1955,51, 394-405.
cal tests and diagnostic techniques. Psy- BRIDGMAN, P. W. The logic of modern Physics.
chol. Bull., Suppl., 1954, 51, Part 2, 1-38. New York: Macmillan, 1927.
ANDERSON, E. E. Interrelationship of drives BURWEN, L. S., & CAMPBELL, D. T. The gen-
in the male albino rat. 1. Intercorrelations erality of attitudes toward authority and
of measures of drives. J. compo Psychol., nonauthority figures. J. abnorm. soc. Psy-
1937,24, 73-118. chol., 1957, 54, 24-31.
AYER, A. J. The problem of knowledge. New CAMPBELL, D. T. A study of leadership among
York: St Martin's Press, 1956. submarine officers. Columbus: Ohio State
BORGATTA, E. F. Analysis of social interaction Univer. Res. Found., 1953.
and sociOlnetric perception. Sociometry, CAMPBELL, D. T. Operational delineation of
1954, 17, 7-32. "what is learned" via the transposition ex-
BORGATTA, E. F. Analysis of social interac- periment. Psychol. Rev., 1954,61, 167-174.
CAMPBELL, D. T. Leadership and its effects urements in the social sciences. N ew York:
upon the group. Monogr. No. 83. Colum- Scribner, 1934.
bus: Ohio State Univer. Bur. Business Res., KELLY, E. L., & FISKE, D. W. The prediction
1956. of performance in clinical psychology. Ann
CARROLL, J. B. Ratings on traits measured by Arbor: Univer. Michigan Press, 1951.
a factored personality inventory. J. ab- LOEVINGER, J., GLESER, G. C., & DuBOIS,
norm. soc. Psychol., 1952, 47, 626-632. P. H., Maximizing the discriminating power
CHI, P.-L. Statistical analysis of personality of a multiple-score test. Psychometrika,
rating. J. expo Educ., 1937, S, 229-245. 1953. 18, 309-317.
CRONBACH, L. J. Response sets and test valid- LORGE, 1. Gen-like: Halo or reality? Psych01.
i ty. Educ. psychol. Measmt, 1946,' 6, 475- Bull., 1937, 34, 545-546.
494. MAYO, G. D. Peer ratings and halo. Educ.
CRONBACH, L. J. Essentials of psychological
psychol. Measmt, 1956, 16, 317-323.
testing. New York: Harper, 1949.
STRANG, R. Relation of social intelligence to
CRONBACH, L. J. Further evidence on re-
certain other factors. Sch. &1 Soc., 1930,32,
sponse sets and test design. Educ. psychol.
Measmt, 1950, 10, 3-31.
CRONBACH, L. J., & MEEHL, P. E. Construct SYMONDS, P. M. Diagnosing personality and
validity in psychological tests. Psychol. conduct. New York: Appleton-Ceotury,
Bull. 1955, 52, 281-302. 1931.
EDWARDS, A. L. The social desirability variable THORNDIKE, E. L. A constant error in psy-
in personality assessment and research. New chological ratings. J. apple Psychol., 1920,
York: Dryden, 1957. 4,25-29.
FEIGL, H. The mental and the physical. In THORNDIKE, R. L. Factor analysis of social
H. Feigl, M. Scriven, & G. Maxwell (Eds.), and abstract intelligence. J. educe Psychol.
Minnesota studies in the philosoPhy of sci- 1936, 27, 231-233.
ence. Vol. II. Concepts, theories and the THURSTONE, L. L. The reliability and validity
mind-body problem. Minneapolis: U niver. of tests. Ann Arbor: Edwards, 1937.
Minnesota Press, 1958. TRYON, R. C. Individual differences. In F. A.
FISKE, D. W. Consistency of the factorial Moss (Ed.), Comparative Psychology. (2nd
structures of personality ratings from differ- ed.) New York: Prentice-Hall. 1942. Pp.
ent ~ources. J. abnorm. soc. Psychol., 1949, 330-365.
44,329-344. UNDERWOOD, B. J. Psychological research.
GARNER, W. R. Context effects and the valid- New York: Appleton-Century-Crofts, 1957.
ity of loudness scales. J. expo Psychol., 1954
j VERNON, P. E. Educational ability and psy-
48, 218-224. chological factors. Address given to the
GARNER, W. R., HAKE, H. W., & ERIKSEN, Joint Education-Psychology Colloquium.
c. W. Operationism and the concept of Univer. of Illinois, March 29, 1957.
perception. Psychol. Rev., 1956, 63, 149-159. VERNON, P. E. Educational testing and test-
lESSOR, R., & HAMMOND. K. R. Construct form factors. Princeton: Educational Test-
validity and the Taylor Anxiety Scale. ing Service, 1958. (Res. Bull. RB-58-3.)
Psychol. Bull., 1957, 54, 161-170.
KELLEY, T. L., & KREY, A. C. Tests and meas- Received June 19, 1958.