1987, Vol. 22, No.

I, /1-/,9
Behavior Analysis
Two Choices Are Not Enough
Murray Sidman
Northeastern University
I am deeply moved by the honor that this
award represents, because it is given to me
by representatives of those whose judg­
ment I value above that of all others. Thank
you for the opportunity to be here with you
was one of a distinguished
companyot'Behavior Analysts who pointed
out ways for us to go, but had to leave us
too soon. I think it is appropriate for us to
preserve the memory of all those who con­
tributed much to our own behavior as
scientists, and to Society's well-being, but
who are no longer with us to observe the ef­
fects they generated. It does Don's memory
no injustice on this or any other occasion to
link him with others, who would in­
clude-among those whom I knew person­
ally-Ralph Hefferline, John Farmer, Wil­
liam Cumming, Harold Weiner, Donald
Bullock, Charles Ferster, Robert Berryman,
Norman Guttman, Kenneth MacCor­
quodale, and Aaron Brownstein.
Each of these scientists and tt:achers con­
tributed differently, but a theme they all
stressed was respect for sound
methodology. Therefore, although conven­
tional wisdom suggests that one should not
emphasize technical details on an occasion
like this, I am going to do just that. I believe
that attention to methodology is a pre­
requisite both for valid research and for ef­
fective application. Because most of you are
actively engaged in behavior analysis, I
have reason to hope that you will find my
concerns relevant, and perhaps even in­
I think it is worth taking a closer look at
This paper was delivered as the Don Hake Ad·
dress to the American Psychological Association,
Wa shington, D. C, August 23, 1986. Reprints may
be obtained from the author, 242 Beacon, St.,
Boslon, MA 02116.
the standard two-choice conditional
discrimination that many of us use to in­
vestigate and teach cognitive skills, The
two-choice technique raises problems that
can easily lead us to false conclusions about
whether and what our students and sub­
jects have learned. Let me place my con­
cerns in the context of some work in which
I have been engaged for the past several
Stimulus Equivalence
In early studies, we taught
retarded boys fIrst to select appropriafe
comparfson pictures when we dictated
sample picture names to them, and then we
taught the boys to select printed words
when we presented the same names as
samples. With reinforcement, they learned
relations between 20 dictated words and
the corresponding pictures and printed

! !
cs: !8C

Figure 1. Each box (A,B,C,l represents a
set of 20 stimuli. Arrows point from
Samples (presented one at a time) to com­
parisons (presented several at a time). Each
solid arrow (AB, Aq represents 20 relations
that were explldtly taught, with reinforce­
ment. Each broken arrow (CB, BC) repre­
sents 20 relations that were tested (In ex­
tinction) but never explidtly taught.
words-40 relations in all.
Then we tested two kinds of relations
that the subjects had shown us in pretests
did not exist for them. We presented pic­
tures as samples, one at a time, along with
several printed comparison words, and we
also presented comparison pictures. Figure
1 represents the relations we taught and
In the AB and BC tests, the subjects now
matched what we would consider the ap­
propriate picture to each printed word, and
the appropriate printed word to each pic­
ture. They had become able to do both
kinds of tasks even though they had been
unable before, and had never been rein­
forced explicitly for doing them. That was
the basic finding (e.g., Sidman, 1977) .
As subsequent experiments became more
complicated, the techniques changed. The
use of stimuli like words, pictures, colors,
letters, numbers, and so on, made it
necessary to give the subjects pretests to
make sure they coUld not already'" do what_
our teachiIig was supposed to make possi­
fife. BeSides usmg up valuable experimental '
tiiile, pretests have to be given without re­
inforcement, or else subjects might learn
the test performances before the experi­
" I ment even starts. This early extinction can
S '\ I cause all kinds of troubles, especially with
f human subjects; for example, they can
'\ leave-either physically or in spirit. And so,
"~ to eliminate the need for pretests
started to use stimuli with which the sub­
{ jects could be expected to have had no ex­
- ;::s perience-arbitrary forms and sounds.
The technical detail I want to emphasize,
however, is not the kind of stimuli but the
number of incorrect alternatives presented
for comparison to each correct stimulus.
On anyone trial in our early experiments,
subjects had to make their selection from a
group of eight possibilites, one correct and
seven wrong. As the experiments grew in
complexity, presenting eight comparisons
at a time proved impractical. Reducing the
number to three made it possible for us to
demonstrate what I still thirIk is an astound­
ing phenomenon. Some of you have seen
this before (Sidman, Kirk, & Willson­
Morris, 1985).
Figure 2, Each box shows 3 stimuli.
Quotes indicate dictated stimuli. Arrows
point from samples (presented' one at a
time) to comparisons (presented three at a
time). Each soDd arrow (AB,AC;DE,DF,EC)
represents 3 relations that were expDcltly
taught, with reinforcement. For expository
purposes, similar positions within boxes In­
d I ~ t e correspondences between samples
and comparisons. Each broken arrow repre­
sents 3 relations that emerged In tests,
without reinforcement. (From Sidman,
KIrk, & WlIIson-Morrls, 1985) .
Figure 2 shows six groups of new stimuli,
three in each group. With AB, AC, DE, DF,
and EC each representing · 3 sample­
comparison relations, we explicitly taught
15 relations; 60 new conditional discrimina­
"l:i6ns emerge<F.' We started with three
3-member, classes of equivalent stimuli:
One class contained the dictated word,
"delta," along with upper- and lower-case
delta; another contained the dictated
"sigma", along with upper- and lower-case
sigma, and the third class contained dic­
tated "xi", along with upper- and lower­
case xi. The existence of these classes was
demonstrated by the subjects' ability to
match the upper- and lower-case letters to
each other, even without having had direct
experience with those relations.
We then built the structure up to three
6-member equivalence classes by teaching
the CE, DE, and DF relations, adding a new
member to each class at each step. The sub­
Two Choices Are Not Enough
jects demonstrated the existence of the
larger classes by their ability to match each
member of a class to any other member (in­
dicated by the broken arrows)-again, with­
out having had direct experience with the
tested relations, and with no reinforcement
during the tests.
Two-Comparison Conditional
If we had continued to use eight com­
parisons per trial, we would have been able
to demonstrate eight 6-member equiva­
lence classes, but we would have needed
48 stimuli. Among such a large number of
stimuli, subjects would find many that they
would have good reason to relate to each
other even without ever having seen them
before. With only 15 visual stimuli in our
experiment, too, you can easily recognize
many possibilities for identity matching
(e.g., "triangles," "circles," ~ t c . ) . and for
the abstraction of common elements.
After doing this experiment, I became
greedy. I reasoned, "We really need only
two comparison stimuli to document the
emergence of equivalence classes. If we
return to the traditional (and presumably
simpler) two-choice setup, we can probably
demonstrate the emergence of classes even
larger than six members. Furthermore, by
giving only two alternatives with each sam­
ple, we can probably study subjects who
come to us with fewer capabilities than any
we had looked at before."
And so, Cleeve Emurian and I, in col­
laboration with Michael Cataldo and John
Eisely at the Kennedy Institute in Balti­
more, began a project to find out if we
could use an equivalence test to measure
intellectual progress in people who started
out unable even to do simple conditional
discrimination. These were youngsters who
had suffered severe brain damage in violent
accidents that left them comatose for a long
time. Then, gradually emerging from coma,
they . passed through a period of
wakefulness but complete sensory and
motor helplessness, followed by a slow
recovery of function, and eventually a
return to school.
Figure 3 sununarizes our plan. We in­
A r
I i
c. c.
Ul: !:
: :
~ H
.: 1.
. ::. ::
W 8
: ~ ~
~ I
i:' :
III :.
: :
: :
: :

Figure 3. Arrowheads, pointing from
samples (presented one at a time) to com­
parisons (presented two at a time) represent
relations that were explicitly taught, with
reinforcement. Comparison stimuli varied
in position. Arrows at the sides, grouped ac­
cording to the class size they represent, indi­
cate emergent relations-tested without
tended to check first for the emergence of
two 3-member classes of equivalent
stimuli, and then to test for successively
larger classes, building up to two 8-member
classes. We would start by teaching the AB
and BC relations and then giving the CA
test for stimulus equivalence. By matching
Cl to AI, and CZ to AZ, the subject would
demonstrate the existence of two 3-mem­
ber classes, one containing AI, Bl, and Cl,
and the other containing AZ, BZ, and CZ.
Then, after teaching the CD relations we
would give the DA test to determine
whether the D-stimuli had joined the
classes, enlarging them to four members. In
successive steps, we would teach DE, EF,
FG, and GH, testing EA, FA, GA, and HA
along the way to determine whether each
Murray Sidman
new pair of comparisons was added to the
existing classes.
As our subjects became more functional
in other ways, would the addition of new
relations to the structures bring about the
emergence of ever larger equivalence
classes? Might subjects at fIrst fail to show
3-member classes, and then, after 3-mem­
ber classes "came in," fail to show
4-member classes, and so on, steadily pro­
gressing through 5-, 6-, 7-, and 8-member
The Learning Criterion
Now for potential problems. Although
the fIrst problem is general, giving subjects
only two choices per trial exacerbates it. An
experimenter or teacher always has to
make an early decision: How to decide
whether the subject or student has actually
learned the conditional relations that are to
generate the emergent performances?
It is tempting to assume that a score of
50% represents random behavior. Then, it
is easy to take another step and accept 75%
-significantly higher than chance-as indi­
cating that the subject has learned some­
thing significant. With only two com­
parisons, however, a score of 75% is possi­
ble even though a subject or student is do­
ing something quite different from what the
experimenter or teacher intended (Sidman,
1980). Figure 4 illustrates this possibility.
The left matrix shows what we would
like to believe is happening when our stu­
dent scores 75% correct. When Al is the
sample, she (or hel correctly selects Com­
parison Bl on 75% of the trials, and when
A2 is the sample, selects Comparison B2 on
the same proportion of trials. On the re­
maining 25% of the trials with each sample,
the student selects the incorrect com­
The center matrix shows another way to
achieve 75%. A frequent interpretation of
this matrix is that Sample Al always con­
trolled correct choices, but that Sample A2
did so on only half the trials. Another inter­
pretation, however, is equally plausible.
One might say, instead, that Sample A2
never controlled a correct choice.
If one is alert to this possibility, one is
likely to come up with the matrix on the
right, which shows another view of the
data from the center matrix. We see now
that Sample A2 always controlled selections
of the left key, independently of the stimuli
at that position from trial to trial. Because
the comparison stimuli were distributed
evenly between both keys, the relation be­
tween Sample A2 and the position of the
keys caused a 50-50 split in the subject's
recorded stimulus selections in the center
On trials with Sample A2, then, the stu­
dent may never have selected eitP.er of the
comparisons, Bl or B2, ignoring those
stimuli in favor of key postion. There may
have been no relation between Sample A2
and the postulated comparison stimuli. This
could also mean that Comparison B2 never
exerted discriminative control in the pres­
ence of either sample; even with Al as sam­
ple, the student may always have looked
for and selected Comparison 81, ignonng
any other stimulus. The discriminations
here were indeed conditional upon th.e
samples, but the comparisons need not
have been those the experimenter or
teacher has specified.
A subject who (or which) has not really
learned the implied conditional relations
Figure 4. Proportion of conditional-discrimination trials on which a subject selected each
comparison (Bl, B2) or key position (Left, Right) in the presence of a given sample (AI , AZ).
the two matrices on the left illustrate two ways an accuracy of 75% might come about . The
matrix on the right, representing the same performace as the center matrix, substitutes key
positions for comparison stimuli in the column headings.
TWo Choices Are Not Enough
between Al and Bl or between Bl and Cl
(Figure 3), cannot be expected to show that
Cl and Al have become conditionally
related. Uncritical acceptance of a 75% to
80% learning criterion is probably responsi­
ble for some subjects' and students' failures
to show stimulus equivalence-or other
kinds of transfer to new situations-after
having been given the prerequisites in a
two-choice environment.
Even an overall 90% accuracy criterion is
not sufficient. Subjects who scored 100%
on the AB relations and 80% on the BC rela­
tions would achieve an average of 90%
even though one half of the performance is
suspect. One cannot, therefore, just com­
bine all the relations to determine a learn­
ing criterion. Not only must the average ac­
curacy be high-95% is better than
90%-llliLeach indivICiual relation

False Positives
We see, then, that using only two choices
per trial during the original teaching can
sow the seeds of during
subsequent testing for emergent relations.
Subjects or students may fail their tests
simply because they have not really learned
the prerequisites (a ubiquitous problem in
contemporary education). But that is the
good news-it can help explain failures to
observe what we would ordinarily have
reason to expect.
The bad news is that the use of only two
choices can also produce
Test results may make it looKas though One
has successfully fostered stimulus
equivalence or some other mechanism of
transfer, but the success may be an artifact.
In spite of perfect test performances, the
students may have learned nothing that one
wanted them to learn. This, unfortunately,
will only show up later, when they fail to
learn their next lesson because they have to
build upon what they are supposed to have
learned before.
Suppose we have taught just the AB and
BC conditional discriminations (Fig. 3), the
smallest number necessary to test for
stimulus equivalence. Now, in the CA test
the subject always selects Al when Cl is
the sample, and A2 when CZ is the sample.
It looks like a positive test for stimulus
But what if we gave the CA test without
having taught AB and Be? With Cl as the
sample, subjects might select Al because
both stimuli have a small gap in an other­
wise continuous outline. Or they might
select Al for less obvious and even idiosyn­
cratic reason. Then, having related Cl to
Al for some perfectly sensible reason, the
only remaining possibility is to relate CZ to
A2. With CZ as the sample, the 2-choice
situation makes it possible for subjects to
choose A2 simply because it is the
one." And so, many would test positively
anyway, even
having liadno experience with AB and BC.
In a more complex experiment, the ar­
tifact remains simple but the illusion
becomes magnified. Suppose we have
taught not just AB and BC, but CD, DE, and
EF, also. Now, we give the FA test for
6-member classes. For reasons quite similar
to the false positive CA results we might
now come to the incorrect conclusion that
our subject had formed two 6-member
classes. Indeed, at every step, we might
have taught the subject that figures with
small openings in an otherwise continuous
outline go together. We might even get a
false positive HA test for 8-member classes
if the subject matches sample HZ to com­
parison A2-perhaps because these stimuli
share a horizontal line at their top-and
then matches HI to Al because that is the
only remaining possibility.
One can, of course, include controls in
one's experiments. For example, repeating
the experiment with the same one
could reverse the comparison that goes
with each sample. But what a waste of time
and effort to fmd out, too late, that one's
success was a delusion. Why fall into such a
trap in the first place?
Another kind of control is demanded by
the logic of equivalence relations. To be cer­
tain that positive HA tests really indicate
8-member equivalence classes, one should
test also for each of the other relations that
are to be expected. The subject should be
able to match each stimulus to every other
Murray Sidman
stimulus in its class, showing all of the rela­
tions depicted by the side arrows in Figure
If the subject does show all the derived
relations, the likelihood that the classes are
artifacts diminishes considerably.
size of the classes increases ositives
beco like . But if any lower-
relation is missing/ then one is obliged to
suspect the original data; if, for example, a
subject cannot do GA, the success with HA
must surely have arisen from some un­
wanted source.
One can take pains to prevent the ob­
vious possibilities. On the other hand, one
person may not observe what is obvious to
another, and sometimes things become ob­
vious only after it is too late. For example,
startling though it may seem when one's at­
tention is called to it, I had failed to notice
at fIrst that Class 1 might have come to in­
clude as many as six elements simply on
the basis of the gaps in the outline of each
If subjects did establish Class 1 on that
basis, then on Class 2 trials they need simp­
ly have made their _selections b'y
Class 1 would consist of aIr stimuli that had
a small opening in a continuous outline,
and Class 2 would contain any stimulus
that was not in Class 1. Although subse­
quent tests might seem to demonstrate
equivalence relations, these would not be
the relations the experimenter or teacher
had intended to teach. If, for example, the
subjects were then required to choose
among three comparisons, Class 2 would
no longer exist.
Conditional discriminations that give a
subject or student only two comparisons
with each sample pose another major prob­
lem: We are now aware that exclusion and
equivalence are incompatible. In his Ph.D.
dissertation, Philip Carrigan has provided a
thoroughgoing theoretical derivation and
experimental analysis of this incompatibili­
ty. I shall just give a bare outline here.
Let us look again at the AB relations in
Figure 3. With Al as the sample, Com­
parison B1 may be called positive and B2
negative, since choosing B1 will produce
reinforcement, and choosing B2 will not.
The subject may get to reinforcement, how­
ever, by any or all of several routes. For ex­
ample, the controlling relation may be that
between Al and its positive comparison,
B1, without involving B2 at all. One might
imagine the subject looking at the compari­
son stimuli, disregarding anything that is
not B1, and then, when B1 is found, select­
ing it. A similar controlling relatiotl may ex­
ist between Sample A2 and its positive com­
parison, B2.
On the other hand, the critical relation
may be that between Al and its negative
comparison, B2, without involving B1 at
all. One must imagine the subject looking at
the comparison stimuli, disregarding any­
thing that is not B2, and then, when B2 is
found, rejecting (excludingl it. Although the
subject would seem to have chosen B1, that
statement does not describe the ctmtrolling
relation at all, since Comparison B1 had
nothing to do with the subject's choice.
Selecting B1 was simply a byproduct of ex­
cluding B2. A similar relation of exclusion
may exist between Sample A2 and its
negative comparison, B1.
One kind of controllin relation between
an companson, then, pr
seiectiorlOy exclusion, ana the other pr!l:.,.
we miilit can selection by
usmit objective record of
;tGniiil selected on each trial does not tell us
which of these routes to reinforcement a
subject has taken.
Neither do the arrows in our diagrams
tell us the nature of the relation between
lle stimuli. Our account of stimulus
equivalence so far has assumed implicitly
that the subjects learn selection by choice,
rather than by exclusion. Because of this
assumption, we have tended to interpret
the arrows as connecting a sample with the
comparison' that our recording apparatus
says the subject selected. Now, however,
we must hedge this interpretation; an ar­
row represents a relation between stimuli.
If the relation was selection by choice, the
arrow happens also to point to the compari­
son that the subject selected. If the relation
was selection by exclusion, however, the
Two Choices Are Not Enough
arrow points to the comparison that our
recording apparatus says the subject did not
select .
Keeping before us this broader interpr.e­
tation of the arrows in oUr diagrams, let us
look at what happens when we test the two
kinds of relations, selection by choice and
selection by exclqsion, for transitivity.
Transitivity is required for equivalence,
and is somewhat simpler to test (Sidman,
Rauzin, Lazar, Cunningham, Tailby, & Car­
rigan, 1982). Figure,.5 will help us see what
happens to transitivity when the subject
selects by choice and by exclusion.
AI___ Bl--..Cl AZ_B2___ C2


Figure 5. Transitivity tests of two rela·
tions, selection by choice and by exclusion.
Arrows point from samples to related com­
parisons. Sond arrows represent relations
that were expncltly taught, and broken ar­
rows indicate emergent relations.
At the top ot Figure 5, we have selection
by choice. When Sample Al (top left) is
related by choice to Comparison Bl, the
recording apparatus tells us that the subject
selects Bl. Then, with Bl as sample, related
by choice to Comparison Cl, the subject
selects Cl . On the top right, we have
similar relations among AZ, B2, and C2.
The bottom of Figure 5 shows selection
by exclusion. When Sample Al (bottom
left) is related by exclusion to Comparison
B2, the recording apparatus tells us that the
subject selects Comparison Bl-by exclus­
ion. Then, with B2 as sample, related by ex­
clusion to Comparison Cl, the subject is
recorded as having selected C2.
When Sample AZ. (bottom right) is related
by exclusion to Comparison Bl, the record­
ing apparatus shows a selection of B2.
Then, with Bl as sample, even though the
controlling relation, selection by exclusion,
is really between Bl and Comparison C2,
we record a selection of Cl.
Regardless of the kind of controlling rela­
tion, selection by choice or by exclusion,
our record at the end of the teaching pro­
cess shows the same results: With Sample
Al or Bl, the subject selects Comparison
Bl or Cl, respectively; with Sample AZ or
B2, the subject selects Comparison B2 or
C2, respectively. We cannot tell from these
recorded selections what the controlling
relation is. The separation comes, however,
when we test for transitivity.
Transitivity tests, indicated by the broken
arrows, call for the presentation of Al or AZ
as a sample, along with Cl and C2 as com­
parisons. If the subject has learned selec­
tion by choice (the upper part of Figure 5).
and if selection by choice is a transitive re­
lation, the subject will select Comparison
Cl when Al is the sample, and Comparison
C2 when AZ is the sample. The emergent
relations, neither of them explicitly taught,
will be AlCl and AZC2.
What is to be expected if the subject has
learned selection by exclusion? If selection
by exclusion (the lower part of Figure 51 is a
transitive relation, the subject will exclude
Comparison Cl when the sample is AI,
thereby producing a recorded selection of
Comparison C2. When the sample is AZ,
selections of Comparison Cl (by exclusion)
will be recorded. The emergent relations,
neither of them explicitly taught, will now
be AlB2 and AZB!.
That is indeed the way it works out.
When the subject selects by exclusion, tran­
sitivity tests yield results exactly the oppo­
site of those for selection by choice. The
same will therefore hold true for
equivalence-testing for CA rather than
Here, then, we have another reason for a
subject's or student's failure to develop
equivalence relations. Whenever tests yield
results exactly the opposite of what equiva­
lence requires, one must suspect selection
by exclusion.
It turns out that we were lucky in our
original experiments, when we used three
or more comparison stimuli. The use of
three choices, one correct and two incor­
rect, fosters selection by choice; exclusion
would require the subject to learn two rela­
tions, whereas choice would require only
Murray Sidman
one. In the two-comparison situation,
however, selection by exclusion is just as
efficient as selection by choice.
Like the other problems I have outlined,
this one, too, comes about because any par­
ticular reinforcement contingency can
generate more than one controlling rela­
tionJ Unless one carries out explicit probes,
one may be unable to determine where the
control lies.
How did our study of recovery from
brain damage turn out? Well, we did not
make some of the mistakes I have noted
here, but we did permit others, and so our
results are inconclusive. I think, though,
that the problem is worth looking into, and
I hope someone will do so. But whoever
does-please, give the subjects more than
just two choices.
Sidman, M. (19771. Teaching some basic pre­
requisites for reading. In P. Mittler (Ed.1
Research to practice in mental retardation: Vol. 2.
Education and training tpp. 353.3601. Baltimore,
MD: University Park Press.
Sidman, M. 119801. A note on the measure­
ment of conditional discrimination. journal of the
Experimental Analysis ofBehavior, 33, 285-289.
Sidman, M., Rauzin, R, Lazar, R, Cunning­
ham, S., Tailby, W., &: Carrigan, P. t19821 . A
search for symmetry in the conditional discrimi­
nations of rhesus monkeys, baboons, and chil­
dren. journal of the Experimental Analysis of
Behavior, 37, 23·44.
Sidman, M., Kirk, B., &: Willson-Morris, M.
(19851 . Six-member stiffiulus Classes generated
by conditional-discrimination procedures. jour·
nal of the Experimental Analysis of Behavior, 43,