You are on page 1of 20

JOURNAL OF RESEARCH IN PERSONALITY 20, 509-528 (1986)

The Rotter Locus of Control Scale: The Comparison of Alternative


Response Formats and Implications for Reliability, Validity,
and Dimensionality

HERBERT W. MARSH

Fucult.~ of Education, The University qf Sydney

AND

GARRY E. RICHARDS

Australian Outward Bound School

Participants completed the Rotter internal/external (IE) instrument using three


different response formats before and after completion of the Outward Bound
program, and were evaluated by external observers at the end of the intervention.
Multitrait-multimethod analyses indicated that five specific IE facets identified
in previous research were consistently distinguished with each of the response
formats. While responses were substantially more internal after the intervention,
effect sizes varied with the IE facet and with the response format. Observer
responses were significantly correlated with self-responses, and provided additional
support for the construct validity of responses to the Rotter instrument and the
interpretation of the intervention effect. Nevertheless, problems with the Rotter
instrument were identified, and the implications for further research were discussed.
0 1986 Academic Press. Inc.

INTRODUCTION
The Internal-External construct as inferred from the Rotter scale. In-
ternal-external (IE) locus of control is hypothesized to be a bipolar con-
struct. The locus is internal if a person perceives events to be contingent
upon his/her behavior or relatively enduring personal characteristics; the
locus is external when events are seen to be contingent upon luck, fate,
the control of powerful others, the environment, or some characteristic
not under his/her own control (Lefcourt, 1976, 1981; Rotter, 1966, 1975;

The authors acknowledge the assistance of Jennifer Barnes for assistance in data preparation,
and thank the participants and staff of the Outward Bound programme. Address requests
for reprints to Dr. Herbert Marsh. Faculty of Education, University of Sydney, Sydney,
NSW 2006. Australia.

509
0092-6566/86 $3 .OO
Copyright 0 1986 by Academic Press, Inc.
All rights of reproduction in any form reserved.
510 MARSH AND RICHARDS

Stipek & Wiesz, 1981). While a large number of I-E scales have been
developed, the most widely used is the Rotter Scale, and this instrument
will be the focus of the present investigation. The Rotter I-E scale consists
of 23 pairs of statements, using a forced-choice format, and six filler
questions. Each pair contains one internal statement and one external
statement, and subjects make a dichotomous choice between the two
alternatives. The Rotter instrument is based on the assumptions that (a)
the IE construct is relatively unidimensional, (b) internality and externality
represent endpoints of a bipolar dimension, and (c) the use of a dichotomous
forced-choice format is the most effective way to infer the construct.
The purpose of the present investigation is to examine these assumptions.
Although Rotter suggested that the IE construct inferred from his
instrument was relatively unidimensional, subsequent research has shown
it to be clearly multidimensional (e.g., Abrahamson, Schludermann, &
Schludermann, 1973; Collins, 1974; Dixon, McKee, & McRae, 1976;
Gurin, Gurin, & Morrison, 1978; Marsh &Richards, in press; Mirels, 1970;
O’Brien & Kabanoff, 1981; Watson, 1981; and Zuckerman & Gerbasi,
1977). MarshandRichards(inpress),onthe basisofareviewofthisresearch
and empirical analyses with confirmatory factor analysis, found that five
factors can be identified in responses to the Rotter scale: General Luck
(GL); Political Control (PC); Success via Personal Initiative (SV); In-
terpersonal Control in Social Relations (IQ; and Control in Academic
Situations (AS). It should be noted, however, that the validity of responses
to the Rotter instrument does not depend on the unidimensionality as-
sumption. Scores representing the separate facets may be useful, and
the total score may adequately reflect a higher-order or more general
construct that incorporates the specific components.
Rotter also assumed that the IE construct is bipolar: that the correlation
between independently derived measures of internality and externality
would approach - 1.0 when corrected for unreliability. While Rotter
presented a theoretical justification for this assumption, it is not testable
with the forced-choice format employed in his instrument, and research
with other scales suggests that the IE construct may not be bipolar when
independent ratings are made of internal and external items (e.g., Marsh,
Cairns, Relich, Barnes, & Debus, 1984; also see Collins, 1974; Klockars &
Varnum, 1975; Zuckerman & Gerbasi, 1977). Unless this bipolarity as-
sumption can be supported, the forced choice format used in the Rotter
IE scale may be dubious. Even if the bipolarity assumption and forced-
choice format are supported, it may be that an expanded forced-choice
scale is superior to the dichotomous scale used by Rotter.
The present investigation. Marsh, Richards, and Barnes (1986)
found that participation in the Outward Bound program produced more
internal scores on the Rotter IE scale. Since the program is specifically
designed to effect the IE construct, this finding provided support for the
ROTTER IE SCALE 511

construct validity of the Rotter IE scale. In the present investigation


Outward Bound participants responded to the Rotter inkument before
and after the program using three different response formats-the original
dichotomous forced choices, an expanded 11-category forced-choice format,
and independent ratings of the internal and the external statement rep-
resenting each item pair. In addition, external observers judged each
participant at the end of the program with the Rotter instrument.
The purpose of the present investigation is to compare responses obtained
using the different response formats. Specific questions to be addressed
are: (I) Do before-after differences vary with the response format or
with the particular component of the Rotter instrument? (2) Do scores
derived from the different response formats-total scores and scores
representing previously identified components of the Rotter scale-agree?
(3) Do scores based on self-responses agree with those based on responses
by external observers? The answers to these questions will be related
to issues of the reliability, dimensionality, and validity of responses to
the Rotter scale and the IE construct.
METHOD
Subjects were the 71 participants in an Australia Outward Bound program conducted
in August 1984. Subjects varied in age between 18 and 34 (median age = 221, a majority
were males (78%), and most were unmarried (87%). The Outward Bound program is a 26-
day residential program that consists of vigorous outdoor activities that are designed to
create a more internal orientation (see Marsh et al., 1986, and Richards, 1977, for
further discussion of the program). As part of their participation in the program, all subjecls
completed three different versions of the Rotter instrument on the first day of the program
(time 1) and again on the last day of the program (time 2). In order to facilitate the
administration of the instruments, the order of presentation was held constant across all
subjects and for both occasions.
On the original Rotter instrument respondents were presented with pairs of statements,
one internal and one external, and were asked to select the one statement that was most
true of them-a dichotomous forced-choice format. On the expanded forced-choice version,
subjects were instructed: “The scores allocated to each pair of statements must add up
to 1I. Therefore if you absolutely agree with one statement, yet totally disagree with the
other, you should allocate a score of 11 to the first and 0 to the other. Similarly, if you
agreed slightly more with one statement than the other, you would apportion 6 to this
statement and 5 to the other.” The statements, the way they were paired, and their ordering
were the same as the original version. On the independent rating format, subjects were
instructed to judge each statement on an I-point response scale that varied from “l-
strongly disagree” to “8-strongly agree.” The statements were the original Rotter statements,
but they were not paired, and the order of the statements was rerandomized. For purposes
of the present investigation, these three versions are called the original format, the expanded
(forced-choice) format. and the rating format.
For most of the Outward Bound program, participants work in small groups, and activities
are specifically designed to foster intense interaction and cooperation among group members.
Hence, by the end of the 26-day program, group members have observed each other in a
wide range of experiences. On the last day of the program, after subjects had completed
the self-report Rotter instruments, they were asked to complete additional instruments
describing each other member in their group. For this task, subjects were instructed: “This
512 MARSH AND RICHARDS

is a questionnaire designed to find out how well a person (in this case, YOU) can assess
views which another person might hold. You should base your assessment on everything
you know about the person, that is, what they say, what they do. the way you feel they
think about things in general and themselves.” In completing this task, approximately half
the subjects used the original format in making their judgments about all the other group
members, and half used the expanded forced-choice format; responses to the independent
rating task were not collected due to time limitations. Through this process, each subject
was described by approximately three observers using the original format and by approximately
three different observers using the expanded format.
Eight sets of scores were computed for each subject: (a) three sets of scores from time 1
(self-responses to the original, the expanded, and the rating formats); (b) three corresponding
sets of self-response scores from time 2; and (c) two sets of scores representing observer-
responses at time 2 (the original and expanded formats). In each set there were five scores
representing the five factors previously identified in responses to the Rotter instrument’
and a total score. For just the independent rating tasks, total internal and total external
scores were also computed (these could not be determined for the original and expanded
forced-choice formats). Because of the design of the study, nearly all subjects completed
all six self-report instruments (418 of 426 instruments were completed), and there were
few missing responses on the completed instruments (less than l/2 of 1%: the group mean
was substituted for the few missing values that did occur on otherwise completed instruments).
Scores for the two sets of external observations, one for the original format and one for
the expanded format, each represented the mean responses of approximately three observers.
For the original Rotter format, the response to each of the 23 item pairs was scored 1
(internal) or 0 (external), and scores for the five components and the total consisted of
the number of internal statements that were selected. For the expanded forced-choice
format, each item pair was scored on an 11 (most internal) to 0 (most external) scale, and
scores for the five components and the total were the sum of these responses. For the
independent rating format, each item pair from the original format was represented by
independent ratings of the internal and the external statement. Each of the five components
and the total score was represented as the sum of responses to the internal statements
minus the sum of responses to the external statements. The total internal score was the
sum of responses to the internal statements, and the total external score was the sum of
responses to the external statements.

RESULTS
Program Effects with Different Formats
The purpose of the first set of analyses are to determine if the effect
of participation in the Outward Bound program, the difference between
time 1 and time 2 responses, varied as a function of the particular IE
component or the response format used to assess it. In a preliminary
analysis conducted with the commercially available Manova procedure
(Hull & Nie, 1981), the effect of time was highly significant, but its effect

’ The five IE facets were defined to be the unweighted average of responses to the
following items (the numbers l-23 refer to the 23 Rotter items-excluding the filler items-
in the order that they appear on the original Rotter instrument): General Luck (1, 12, 13,
15, 17, and 20); Political Control (2. 10, 14. 18, and 23); Success via Personal Initiative
(5, 7, 9, 11, and 22); Interpersonal Relations (3, 6, 16, and 21); Control in Academic
Situations (4, 8, and 19).
ROTTER IE SCALE 513

varied significantly with both the IE component and the response format.
The nature of these complex interactions was examined in a set of analyses
summarized in Table 1. For all three response formats, scores are more
rnal at time 2 than time 1, and the differences are statistically significant
for all but the IC facet. The effect sizes are larger for the expanded
forced-choice and independent rating formats than for the original format.
While the ordering of the effect sizes for different IE components is not
completely consistent for the three response formats, the effects for the
GL and SV (and also the total score) tend to be the largest, while the
effect for IC fails to reach statistical significance for any of the formats.
Also, for the independent rating task, total internal scores are more
internal at time 2 than time 1, and total external scores are less external.
An evaluation of the effectiveness of the Outward Bound program is
not the major purpose of this investigation, and alternative explanations
of the time l/time 2 differences may be viable and will be discussed in
more detail elsewhere (see Marsh et al,, 1986. for further discussion).
Nevertheless, if a specific intervention designed to alter the IE construct
results in a systematic change in responses to the Rotter instrument,
then the findings provide one source of support for the construct validity
of responses to the IE scale. However, these findings also suggest that
this support for the construct validity varies with the IE component and
with the response format. In particular, the support is stronger for both
of the alternative formats than for the original format.

Agreement among Different Indicators of the IE Construct


Total scores. Correlations among total scores for the different response
formats-for self-responses at time 1 and time 2, and for observer-responses
at time 2-are shown in Table 2. The average correlation between the
three total scores representing time 1 responses is .72 and is approximately
the same as the average reliability estimate’ of these scores. At time 2,
the corresponding average correlation and average reliability estimate
are both about .88. The higher agreement among the different formats
and the higher reliabilities at time 2 probably reflect the systematic impact
of the Outward Bound program, but it may also reflect an increased

’ The comparison of coefficient alpha estimates of reliability with correlations between


responses to the same set of items using different response formats is complex. In particular
the alpha coefficients do not provide an upper limit to the correlations. For example, if
there is considerable error/uniqueness that is specific to individual items and the different
response formats contribute little unique error to the responses, then the correlations
between different formats would be higher than the coefficient alphas (e.g., see results in
Table 3). Nevertheless, the fact that the coefficient alphas and the correlations are of
similar magnitude for the total scores suggests that the different formats are measuring a
similar construct.
TABLE 1
F
INTERVENTION EFFECT SIZES FOR ALTERNATIVE FORMS OF THE ROTTER IE SCALE
z
kc
Time 1 Time 2 Time I/
- time 2 Effect” 0%
Rotter score Mean SD Mean SD con size E
8
Original format a-
General luck (GL) 3.4 1.3 3.8 1.5 .52** .33* t3
Political control (PC) 1.8 1.4 2.6 1.7 .49** .43** v,
Success via initiative (SV) 3.3 1.3 3.7 1.4 .47** .33**
Interpersonal control (IC) 1.7 1.2 1.9 1.3 .32** .16
Academic situations (AS) 1.9 .9 2.3 .9 .51** .17*
Total 12.0 3.6 14.1 4.8 .55** .57+*
Expanded forced-choice
General luck (GL) 37.0 7.2 42.8 10.4 .55** .80**
Political control (PC) 24.5 7.9 28.5 10.4 .66** .50**
Success via initiative (SV) 33.9 6.2 37.9 8.2 .50** .65**
Interpersonal control (IC) 22.6 5.0 24.1 6.8 .42** .31
Academic situations (AS) 20.9 4.4 23.5 5.0 .55** .56**
Total 140.8 19.3 157.5 32.2 .56** .87**
Independent rating?
General luck (GL) 3.1 10.2 9.2 11.2 .56** .60**
Political control (PC) -4.0 9.8 0.1 11.8 .62** .42**
Success via initiative (SV) 6.1 7.2 10.3 9.4 .45** .69**
Interpersonal control (IC) 0.0 7.0 0.5 8.1 .56** .08
Academic situations (AS) 5.0 6.2 7.5 6.3 .74** .41**
Total internal 115.5 15.5 126.0 19.2 .5.5** .68**
Total external 105.4 16.7 98.3 20.6 .59** .43**
Total 10.1 26.1 27.7 36.3 .60”* .67**

Note. A repeated-measures ANOVA was conducted to determine the significance of the pretest post-test difference. While the main effect of
time was highly significant, the size of this effect varied (i.e., interacted significantly) with particular scale and the form of responses. Consequently,
paired t tests were employed to examine pair-wise differences.
* p < .05; **p < .Ol. TI
’ Effect size is defined as the pretest (time I) post-test (time 2) difference divided by the pretest standard deviation. The tests of statistical 3
significance presented with the effect sizes are based on paired t tests. !!
b For the independent rating format. the five specific factor and the total scores are the sum of responses to internal statements minus the sum ;
of responses to external statements. The Total Internal and Total External scores are the sum of responses to the internal and external statements.
8
TABLE 2
CORRELATIONS AMONG ROTTER TOTAL SCORES

Scales I 2 3 4 5 6 7 8 9 IO 11 12
- - --~__ ___-
Time I self-responses
1. Original format (63)
2. Expanded 71 (71)
3. Ratings-internal 65 56 (75)
4. Ratings-external -60 -54 -30 (78)
5. Ratings-total 71 69 79 -82 (81)
Time 2 self-responses
6. Original format 55 53 43 -42 53 (81)
7. Expanded 44 56 42 -32 45 85 (91)
8. Ratings-internal 50 51 55 -34 55 82 82 (88)
9. Ratings-external -45 -45 -28 59 -55 -76 -82 -61 WN
10. Ratings-total 53 52 45 -52 60 88 91 89 -91 (93)
Time 2 observer-responses”
I I. Original format 16 02 06 -07 08 34 29 40 -21 33 (84)”
12. Expanded 23 31 14 -19 20 42 33 28 -30 32 41 (85)

Now. Correlations, presented without decimal points. larger than .23 are statistically significant. The values in parentheses are coefficient alpha estimates of reliability.
’ The multiple observer ratings for each subject, about three for each type of response, were averaged, and this average was correlated with the self-ratings. The
reliability estimates for the observer responses are for separate sels of responses and are comparable to reliability estimates for each of the other total scores.
ROTTER IE SCALE 517

familiarity with the Rotter instrument. For both time 1 and time 2, responses
to the original format are least reliable, while those to the rating format
are most reliable.
An important purpose of the present investigation is to test the bipolarity
of the IE construct. While this assumption is not testable with the original
Rotter format (or the expanded forced-choice format), the correlation
between the total internal and total external scores for the rating format
does provide such a test. This correlation is - .30 and - .61 (- .39 and
- .68 after correction for attenuation) for responses from time 1 and time
2, respectively. The more negative correlation at time 2 probably again
reflects the effect of the intervention (i.e., participants became more
internal and less external), but may also reflect an increased familiarity
with the scale. While these correlations are clearly in the predicted
direction, the size of the correlations-even after correction for unreliability
and particularly at time 1 which more closely approximates the typical
application of the instrument-may not be sufficiently large to support
the assumption of bipolarity.
Correlations between observer-responses and self-responses offer an
important test of the validity of self-responses to the Rotter instrument
against an external validity criterion. The average correlation between
self-responses and observer responses at time 2 is .34, and this does not
seem to depend on the particular response format (though only the original
and expanded formats were completed by the external observers). These
findings offer modest support for the validity of responses to the Rotter
total score, but do not appear to offer evidence for the superiority of
any of the three response formats.
To the extent that the Outward Bound program does have a systematic
(i.e., valid) effect on the IE construct, then observer responses at the
end of the program should correlate more highly with self-responses at
the end of the program than with self-response before the start of the
program. Inspection of the correlations in Table 2 supports this prediction
and thus offers further support for the interpretation of the intervention
and for the validity of responses to the Rotter instrument. While alternative
explanations may again be viable, these findings, coupled with the sys-
tematic shifts in responses to both internal and external items as described
earlier, do suggest that the Outward Bound program does systematically
effect the IE construct.
Specific Rotterfacets-Self-responses. Multidimensionality is typically
inferred through techniques such as factor analysis and multitrait-mul-
timethod (MTMM) analysis. Marsh and Richards (in press) argued for
the existence of five factors in the Rotter instrument on the basis of a
review of previous factor analytic research and the results of their con-
firmatory factor analyses. One purpose of the present investigation is to
further test these factors with MTMM analyses. Campbell and Fiske
518 MARSH AND RICHARDS

(1959) argue that multiple indicators of the same construct should be


substantially correlated (convergence), but that indicators of different
traits should be less correlated (divergence). For purposes of this study,
the multiple traits are the five Rotter factors, and the multiple methods
are the three different response formats. MTMM matrices representing
these 15 variables are presented separately for responses from time 1
and time 2 (see Table 3),3 and these are examined with the four criteria
developed by Campbell and Fiske (1959; also see Marsh, Barnes, &
Hocevar, 1985; Marsh & Hocevar, 1983, 1984).
The application of the four Campbell-Fiske guidelines to the two MTMM
matrices indicates that
(1) All 15 convergence coefficients, correlations between the indicators
of the same trait assessed by different response formats-the underlined
values in Table 3, are statistically significant for responses at time 1 and
at time 2, and the means of these coefficients (.62 and .78) are substantial.
(2) Each convergence coefficient (mean r’s = .62 and .78 for time 1
and time 2) is higher than other correlations in the same row or column
of the corresponding heterotrait-heteromethod squares for all 120 com-
parisons (mean r = .20) for time 1 responses and all 120 comparisons
(mean r = .41) for time 2 responses.
(3) Each convergence coefficient (mean r’s = .62 and .78 for time 1
and time 2) is higher than other correlations in the same row or column
of the corresponding heterotrait-monomethod for 112 of 120 comparisons
(mean r = .24) for time 1 responses and for 113 of 120 comparisons
(mean r = .44) for time 2 responses.
(4) The pattern of correlations among the different traits appears to
be similar for each of the response formats for both time 1 and time 2
responses (in particular the correlation between GL and SV is always
largest, while correlations involving PC and to a lesser extent IC tend
to be lower).
These findings provide strong support for both the convergence of
scores from the different response formats and the distinctiveness of the

3 In some applications of MTMM analyses, time is considered to be a method variable,


and Marsh, Smith, Barnes and Butler (1984; also see Marsh, Barnes, & Hocevar. 1985:
Marsh & Butler, 1984) argue for multitrait-multimethod-multitime studies. Such an approach
is valuable when there is no systematic intervention between the different testing occasions,
but is probably not appropriate when there is a systematic intervention that has a significant
effect. While the Campbell-Fiske analysis is the most frequently used approach to the
examination of MTMM data, recent advances in the application of confirmatory factor
analysis have resulted in more sophisticated analytic techniques (e.g., Marsh et al.. 1985;
Marsh & Hocevar, 1983). However, the clarity of the results based on the Campbell-Fiske
guidelines suggest that these techniques would add little to the conceptual understanding
of the data, and the technical detail entailed in their description and presentation would
detract from the focus of the present investigation. Nevertheless, the MTMM matrices
required to perform such analyses appear in Tables 3 and 4.
TABLE 3
MULTITRAIT-MULTIMETHOD (MTMM) MATRIX OF CORRELATIONS AMONG SCALES AT TIME 1 (ABOVE THE MAIN DIAGONAL) AND TIME 2
(BELOW THE MAIN DIAGONAL)

Scales” 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
-
Original
1. GL - 14 46 32 30 61” 09 53 37 24 73 -02 54 43 27
2. PC 27 - 28 -05 04 34 64 11 -03 00 20 66 12 04 07
3. sv 64 48 - 09 49 45 05 54 05 32 55 11 ss 18 31
4. IC 40 07 33 - 11 10 -06 10 60 22 23 -08 17 11 08
5. AS 39 08 46 29 - 37 09 43 12 54 34 06 32 31 46
Expanded
6. GL 32 62 45 40 - 12 67 28 31 20 49 35 24 8
76 65
7. PC 25 42 14 14 41 - 09 12 -13 -05 11 02 -06 ::
87 64
8. sv 62 38 34 42 84 48 - 13 36 52 10 17 27 E
70 68
9. IC 44 18 39 21 34 51 29 50 - 25 21 02 22 64 09 R
10. AS 47 13 44 30 2 60 30 62 50 - 35 -09 30 36 2 rs
Ratings F
11. GL 37 72 42 43 45 74 50 57 - 11 64 42 42 m
II 82
12. PC 30 88 44 18 12 40 90 44 24 20 47 - 02 05 -05
13. sv 66 43 zz 39 44 76 44 sg. 47 53 81 45 - 28 48
14. IC 47 23 41 28 25 53 30 47 ss 43 53 29 54 - 30
15. AS 50 25 55 31 63 61 28 61 45 76 52 24 59 45 -
Coefficient
alphas
Time 1 29 52 38 40 44 50 70 54 34 30 71 71 54 57 70
Time 2 55 76 59 60 56 84 89 82 69 66 80 89 81 73 78

NOW. All coefficients are presented without decimal points and those larger than .23 are statistically significant.
’ See Table 2 for definitions of the scales.
’ The underlined coefficients are convergence coefficients, the correlation between the same trait inferred from two different response formats. 2
520 MARSH AND RICHARDS

specific IE traits identified by Marsh and Richards (in press). Even though
the convergence coefficients were substantially higher for time 2 (Table 2)
support for the distinctiveness of the traits was evident for both time 1
and time 2.
Specijk Rotter factors-Observer responses. Correlations among the
two sets of observer responses collected at time 2 appear as part of
Table 4. While a MTMM analysis is appropriate for the examination of
this 10 x 10 correlation matrix (i.e., five traits and two methods), the
interpretation is quite different from that described earlier. Convergence
in Table 3 represented correlations between self-responses by the same
person to the same set of items with different response formats. Con-
vergence here represents correlations between external observations by
different individuals to the same set of items with different response
formats. The application of the four Campbell-Fiske guidelines to the
MTMM matrix for the observer responses indicates that:
(1) Four of five convergence coefficients are statistically significant
(mean r = .29).
(2) Convergence coefficients (mean r = .29) are higher than other
correlations in the same row or column of the corresponding heterotrait-
heteromethod squares (mean r = .19) for only 27 of 40 comparisons.
(3) Convergence coefficients (mean r = .29) are higher than other
correlations in the same row or column of the corresponding heterotrait-
monomethod triangles (mean r = Sl) for only 7 of 40 comparisons.
(4) The pattern of correlations among the different traits appears to
be similar for the two response formats (and also similar to the pattern
observed for the self-responses in that the correlation between GL and
SV is large, while correlations involving the PC factor and to a lesser
extent the IC factor tend to be lower).
In summary there is modest agreement between two different sets of
observers using different response formats (i.e., convergence) for four
of the five facets, but support for the divergent validity of the traits is
weaker. Support for both the convergent and divergent validity of the
Political Control facet is strongest, there is little support for either con-
vergent or divergent validity of IC. and support for the divergent validity
of the other three facets is weak. It is interesting to note that convergence
on PC (.47) and GL (.39) were comparable to convergence on the total
score (-41) even though these subscales were based on fewer items. The
results suggest that different external observers are able to agree on
some aspects of the IE construct, though apparently not on IC, and are
able to differentiate the PC from other facets.
Specific Rotter facets-Self and observer agreement. Correlations be-
tween observer responses and self-responses also appear in Table 4.
While it would be possible to combine the three methods of self-response
(summarized above and in Table 3) and the two methods of observer
ROTTER IE SCALE 521

response (summarized above and in Table 4) into a single MTMM analysis,


such an analysis would confound different types of convergence and
different sources of method-variance. It is informative, however, to examine
the 30 correlations representing agreement between self-responses and
observer responses to matching facets (i.e., convergent validities, see
Table 4). Only 15 of these 30 correlations are statistically significant,
and none is larger than .33. None of the six convergent validities involving
the IC factor is statistically significant, while the mean coefficients for
the other five factors are modest: GL (.28), PC (.22), SV (.25), AS (.20).
This relative lack of convergent validity precludes support for divergent
validity based on these 30 convergent validities, as is evident in a more
detailed application of the Campbell-Fiske criteria.
Summary of agreement between different indicators. Based on the
totai scores (Table 2) there is substantial agreement among self-responses
to ditTerent forms of the Rotter instrument, and modest agreement between
self-responses and responses by external observers. The MTMM analyses
based on self-responses to different forms of the Rotter (Table 3) provided
strong support for the convergence and also the divergent validity of
five specific facets previously identified in responses to the Rotter in-
strument. The analysis of agreement (Table 4) between responses by two
different sets of observers, and agreement between self-responses and
observer-responses, provided modest support for convergent validity but
weak support for the divergent validity of these facets. These findings
suggest that the different self-response formats measure a similar IE
construct and that scores from each clearly differentiates among multiple
facets of the construct. While the modest agreement between self-responses
and observers’ responses provides some support for the construct validity
of the IE construct, there was little evidence that this agreement was
specific to particular facets of the construct.

SUMMARY AND IMPLICATIONS


The purpose of the present investigation was to compare scores derived
from different self-response formats to the Rotter IE instrument: to compare
them to each other, to compare them in terms of inferring the effect of
a systematic intervention, and to compare them with the responses by
external observers. The IE scores derived from the different self-response
formats apparently provide inferences about a similar IE construct. Re-
sponses to each of the three formats were highly correlated with those
from other formats, and-at a gross level-were similarly effected by
participation in the Outward Bound program, differentiated similarly among
the specific IE facets, and were similarly related to the responses by
external observers. Scores on the original Rotter format were substantially
less reliable (i.e., internally consistent) and less affected by the intervention
TABLE 4
CORRELATIONS AMONG OBSERVER RESPONSES AND SELF RESFQNSES

Scales” 1 2 3 4 5 6 I 8 9 10 F
E
3:
Observer original
1. GL - 5
2. PC 56 -
3. sv 17 60 - E
4. IC 59 48 - G
49
5. AS 70 52 68 42 - %
B
Observer expanded
6. GL x! 05 31 25 07 -
31 35 31 27 -
7. PC 41 47
8. SV 32 03 22 15 11 78 32 -
9. IC 02 01 -01 OS 00 45 31 46 -
10. AS 33 18 35 21 27 53 32 67 26 -
Self original
Il. GL 33 13 22 10 26 22 07 33 13 36
12. PC 32 21 30 18 21 39 18 37 29 21
13. sv 21 16 1J. 14 02 37 -02 II 23 20
14. IC 25 08 07 00 15 17 08 22 11 25
15. AS 02 - 10 -05 04 II 09 -06 20 29 2
Self expanded
16. GL 19 03 14 03 14 21 -04 31 08 23
17. PC 35 22 29 33 22 26 29 24 17 21
18. sv 19 04 II 07 02 30 - 10 2 07 21
19. IC 19 03 - 10 iu 05 09 09 06 07 13
20. AS 13 -05 - 06 01 14 15 03 28 12 2
Self ratings
21. GL 24 14 19 14 13 32 -06 30 10 26
22. PC 35 28 29 28 24 30 II 29 18 24
23. SV 29 19 22 14 12 33 -06 31 08 29
24. IC 27 05 04 00 11 13 04 15 -&I 22
25. AS 09 -11 05 -02 07 23 -04 29 18 22
Note. All coefficients are presented without decimal points and those larger than .23 are statistically significant. The underlined coefficients are g
convergence coefficients, the correlation between the same trait inferred from different response formats and/or inferred by different individuals.
a See Table 2 for definitions of the scales. 3P
R
“0
$
m
524 MARSH AND RICHARDS

than were scores from the other two formats. However, support for the
divergent validity for the specific IE facets (Table 3) and agreement
between self-responses and observer-responses did not vary substantially
with the particular response format. These findings provide some evidence
against the dichotomous forced-choice format used in the original Rotter
instrument, but are not particularly compelling.
The assumption that internality and externality represent endpoints of
a bipolar continuum has important theoretical implications for IE research
and is implicit in the use of the original forced-choice format (and the
expanded forced-choice format). One implication of this assumption is
that independently derived scores of internality and externality should
correlate - 1.O with each other after correction for unreliability. When
subjects were asked to make independent ratings of the internal and
external statement from each Rotter item-pair, the correlation was only
modestly negative before the start of the intervention but substantially
negative after the intervention. The modest size of the correlation at
time 1 provides evidence against the bipolarity of the construct when
no intervention has taken place, and the intervention apparently affects
the extent of the bipolarity of the IE construct. A second implication of
this assumption is that if an intervention produces an increase in internality,
then it should produce a decrease in externality of a similar magnitude.
While the intervention did increase internality and decrease externality,
the effect size was substantially larger for the internal score. Taken
together, these findings cast doubt on the validity of the bipolarity
assumption.
Rotter originally assumed that responses to his instrument were uni-
dimensional, but subsequent research has clearly shown this not to be
the case. Marsh and Richards (in press)-on the basis of a literature
review and factor analysis-identified five factors in responses to the Rotter
instrument. The results of the MTMM analyses summarized in Table 3
provide strong support for these five facets and show that the differentiation
among the facets does not depend upon the response format. Also, the
effect size of the intervention effect and the extent of agreement between
self-responses and observer-responses varied with the specific IE facet.
However, the different scales, particularly those based on the original
format, are not sufficiently reliable (see Table 3) to be considered separately;
their practical application would require the construction of new scales
that contain more items and items that are more clearly related to the
specific facet that each is designed to measure.
The Outward Bound program is specifically designed to alter the IE
construct, and the Rotter IE instrument is specifically designed to measure
the IE construct. Thus, using a construct validity approach, the present
study provides support for the effectiveness of the intervention and for
the validity of the Rotter IE instrument. While alternative explanations
ROTTER IE SCALE 525

exist for the findings presented here, and those in Marsh, et al. (1986),
a more detailed examination of the results renders some of them as
implausible. First, Marsh, et al. found significant effects on both the
Rotter IE construct and multiple dimensions of self-concept, but reported
that changes in self-concept were not substantially correlated with changes
in IE. Alternative explanations based on response biases, a placebo
effect, or what Marsh, et al. called a post-group euphoria effect would
produce changes in these two self-report measures that were substantially
correlated. Second, the effects in the present study were consistent across
scores derived from three response formats, indicating that the results
do not depend on a particular format. Third, after the intervention the
internal scores for the independent rating format were more internal and
external scores less external, a finding that is inconsistent with many
forms of response bias. Fourth, observer-responses collected at the end
of the program were modestly correlated with self-responses at the end
of the intervention, but were less correlated with self-responses collected
before it. This finding, particularly when coupled with the increased
internality of the time 2 scores, suggests that the intervention produced
systematic, observable changes in the IE construct over the time l/time
2 interval; it is unlikely that changes produced by most potential sources
of invalidity would be systematically related to observer responses. Hence,
while alternative explanations may be viable, the findings support the
effectiveness of the intervention that alters IE and the construct validity
of the Rotter IE instrument as an indicator of this change.
The responses of external observers to the Rotter items were also used
as a validity criterion in the present investigation. This use of the observer
responses assumes that the Rotter items are appropriate for use by ob-
servers, that the observers are able to infer IE, and that observers are
able to differentiate among the different IE facets. There was little evidence
to suggest that observers were able to differentiate among the specific
IE components in a manner that was similar to the self-responses. However,
the finding that two sets of observer responses are modestly correlated
with each other, and with the self-responses, provides support for the
first two assumptions and for the construct validity of the self-responses.
While the extent of self-observer agreement in this study is only modest,
Shrauger and Schoeneman (1979) reviewed studies of the agreement
between self-perceptions and the evaluations by others across a wide
variety of constructs and concluded that “there is no consistent agreement
between people’s self-perceptions and how they are actually viewed by
others” (p. 549). Hence, evidence for construct validity found in the
present investigation is stronger than typically reported in other research
that uses this approach (but see Marsh et al., 1985).
What are the implications of this study for the IE construct and the
Rotter instrument? The results of this study provide evidence for the
526 MARSH AND RICHARDS

construct validity of the IE construct as assessed by the Rotter scale,


but also suggest problems with the instrument. First, there is evidence
that responses to the expanded and rating formats provide better measures
than does the original format. Second, the negative relationship between
internality and externality may not be sufficiently large to warrant the
forced choice format used on the Rotter instrument. Third, responses
to the Rotter instrument are clearly multidimensional, but the separate
components are not measured with sufficient reliability to be practically
useful, and there is an insufficient theoretical basis to show that the
particular components included on the Rotter instrument are the most
appropriate for assessing a more general construct that is typically inferred
from the total score (see Marsh & Richards, 1986, for further discussion).
Fourth, once the multidimensionality of the construct is conceded, a host
of new theoretical issues about how statements are paired to form each
item pair must be examined. For example, only item pairs from the
Political Control facet are consistently comprised of two statements that
unambiguously represent the same facet, and this may explain why this
facet was shown to be most clearly distinguishable facet in the present
research.
Lefcourt (1976, 1981) argued that the locus of control construct is
multidimensional and that to enhance the usefulness of the construct
researchers must develop distinct subscales that adequately assess goal-
specific or context-specific aspects of the construct. Similarly, while the
present research provides some support for the use of the Rotter IE
scale as a general measure, it also suggests that continued reliance on
this instrument may be counterproductive. Instead, it is proposed that
researchers develop instruments that adequately measure multiple facets
of the construct that are logically derived from a theoretical model, use
empirical procedures such as factor analysis and MTMM analysis to test
if these facets are reflected in responses to the instruments, and determine
if external validity criteria and interventions are logically, and differentially,
related to the multiple facets. Once a well-defined set of specific factors
has been identified, hierarchical factor analysis can be used to infer a
more general facet and to determine how well the specific facets are
explained by the general facet (see Marsh & Richards, in press: Marsh
& Hocevar, 1985). On the basis of such research it may then be possible
to select a relatively small number of items that adequately reflect the
breadth of specific components incorporated into the general construct-
the apparent goal of the original Rotter instrument.
REFERENCES
Abrahamson, D., Schuldermann, S., & Schuldermann, E. (1973). Replication of dimensions
of locus of control. Journal of Consulting and C/inica/ Psychology, 41, 320.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the
multitrait-multimethod matrix. P.pho/ogicul Bdletin, 56, 81-105.
ROTTER IE SCALE 527

Collins, B. (1974). Four components of the Rotter Internal-External scale: Belief in a


difficult world, a just world, a predictable world, and a politically responsive world.
Journal of Personality and Social Psychology, 29, 381-391.
Dixon, P. N., McKee, C. S., & McRae, B. C. (1976). Dimensionality ofthree adult objective
locus of control scales. Journal of Personality Assessment, 40, 310-319.
Gurin, P., Gurin, G., & Morrison, B. M. (1978). Personal and ideological aspects of internal
and external control. Social Psychology, 41, 275-296.
Hull, C. H., & Nie, H. H. (1981). SPSS update 7-9. New York: McGraw-Hill.
Klockars, A. J., & Varnum, S. W. (1975). A test of the dimensionality assumptions of
Rotter’s Internal-External scale. Journal of Personality Assessment, 39, 397-404.
Lange, R. V., & Tiggemann, M. (1981). Dimensionality and reliability of the Rotter I-E
locus of control scale. Journal of Personality Assessment, 45, 398-406.
Lefcourt, H. M. (1976). The locus of control: Current trends in theory and research. New
York: Wiley.
Lefcourt, H. M. (1981). Research with the locus of control construct, Vol. I, Assessment
methods. New York: Academic Press.
Marsh, H. W. (in press). The hierarchical structure of self-concept and the application of
hierarchical confirmatory factor analysis. Journal of Educational Measurement.
Marsh, H. W., Barnes, J.. & Hocevar, D. (1985). Self-other agreement on multidimensional
self-concept ratings: Factor analysis and multitrait-multimethod analysis. Journal af
Personality and Social Psychology, 49, 1360-1377.
Marsh, H. W., & Butler, S. Evaluating reading diagnostic tests: An application of confirmatory
factor analysis to multitrait-multimethod data. Applied Psychologicul Measurement,
8, 307-320.
Marsh, H. W., Cairns, L., Relich, J., Barnes, J., & Debus, R. L. (1984). The relationship
between dimensions of self-attribution and dimensions of self-concept. Journul of
Educational Psychology, 76, 3-32.
Marsh, H. W.. & Hocevar, D. (1983). Confirmatory factor analysis of multitrait-multimethod
matrices. Journal of Educational Measurement, 20, 231-248.
Marsh, H. W., & Hocevar, D. (1984). The factorial invariance of students’ evaluations of
college teaching. American Educational Research Journal, 21, 341-366.
Marsh, H. W.. & Hocevar, D. (1985). The application of confirmatory factor analysis to
the study of self-concept: First and higher order factor structures and their invariance
across age groups. Psychological Bulletin, 97, 562-582.
Marsh, H. W., & Richards, G. (in press). The multidimensionality of the Rotter I-E scale
and its higher-order structure: An application of confirmatory factor analysis. Multivariate
Behavioral Research.
Marsh, H. W.. Richards, G., & Barnes, J. (1986). Multidimensional self-concepts: The
effect of participation in an Outward program. Journal of Personality und Social
Psychology, 50, 195-204.
Mirels, H. L. (1970). Dimensions of internal versus external control as measured by the
Rotter scale. Journal of Consulting and Clinicul Psychology, 34, 226-228.
O’Brien, G. E., & Kabanoff. B. (1981). Australian norms and factor analysis of Rotter’s
Internal-External Control scale. Australian Psychologist, 16, 184-202.
Richards, G. E. (1977). Some educational implications & contributions of Outword Bound.
Sydney: Australian Outward Bound Foundation.
Rotter, J. B. (1966). Generalized expectancies for internal versus external control of re-
inforcement. Psychological Monographs, 80, (1, Whole No. 609).
Rotter, J. B. (1975). Some problems and misconceptions related to the construct of internal
versus external control of reinforcement. Journal of Cansulting and Clinical Psychology,
43, 56-67.
Shrauger, J. S., & Schoeneman, T. J. (1979). Symbolic interactionist view of self-concept:
Through the looking glass darkly. Psychological Bulletin, 86, 549-573.
528 MARSH AND RICHARDS

Stipek, D. J., & Weisz, J. R. (1981). Perceived personal control and academic freedom.
Review of Educational Research, 51, 101-137.
Watson, J. M. (1981). A note on the dimensionality of the Rotter Locus of Control scale.
Australian Journal of Psychology, 33, 319-330.
Zuckerman, M., & Gerbasi, K. C. (1977). Dimensions of the I-E scale and the relationship
to other personality measures. Educationa/ and Psychological Measurement, 37, 1.59-
175.

You might also like