You are on page 1of 29

ISSUES AND TRENDS

Jonathan Osborne and Maria Pilar Jiménez-Aleixandre, Section Coeditors

Basic Inferences of Scientific


Reasoning, Argumentation, and
Discovery

ANTON E. LAWSON
Organismal, Integrative and Systems Biology, School of Life Sciences, Arizona State
University, Tempe, AZ 85287, USA

Received 7 October 2008; revised 22 April 2009; accepted 29 April 2009

DOI 10.1002/sce.20357
Published online 28 May 2009 in Wiley InterScience (www.interscience.wiley.com).

ABSTRACT: Helping students better understand how scientists reason and argue to draw
scientific conclusions has long been viewed as a critical component of scientific literacy,
thus remains a central goal of science instruction. However, differences of opinion persist
regarding the nature of scientific reasoning, argumentation, and discovery. Accordingly,
the primary goal of this paper is to employ the inferences of abduction, retroduction,
deduction, and induction to introduce a pattern of scientific reasoning, argumentation, and
discovery that is postulated to be universal, thus can serve as an instructional framework to
improve student reasoning and argumentative skills. The paper first analyzes three varied
and presumably representative case histories in terms of the four inferences (i.e., Galileo’s
discovery of Jupiter’s moons, Rosemary and Peter Grants’ research on Darwin’s finches,
and Marshall Nirenberg’s Nobel Prize–winning research on genetic coding). Each case
history reveals a pattern of reasoning and argumentation used during explanation testing
that can be summarized in an If/then/Therefore form. The paper then summarizes additional
cases also exemplary of the form. Implications of the resulting theory are discussed in terms
of improving the quality of research and classroom instruction.  C 2009 Wiley Periodicals,

Inc. Sci Ed 94:336 – 364, 2010

Correspondence to: Anton E. Lawson; e-mail: anton.lawson@asu.edu


Contract grant sponsor: National Science Foundation.
Contract grant number: EHR 0412537.


C 2009 Wiley Periodicals, Inc.
BASIC INFERENCES OF SCIENTIFIC REASONING 337

INTRODUCTION
Scientific literacy as an instructional goal typically includes students’ understanding of
the nature of science and scientific reasoning (e.g., American Association for the Advance-
ment of Science, 1989, 2007; Educational Policies Commission, 1961, 1966; National Re-
search Council, 1990, 1996, 2001). Not surprisingly, numerous papers have been published
in recent years exploring the nature of scientific reasoning, scientific argumentation, and
scientific discovery. Yet differences and disagreements persist (e.g., Allchin, 2006; Alters,
1997; Bonner, 2005; Lawson, 2006a; Samarapungavan, Westby, & Bodner, 2006; Sampson
& Clark, 2008; Westerland & Fairbanks, 2004; Wivagg & Allchin, 2002). Accordingly,
the primary goal of this paper is to analyze several varied case histories to identify basic
inferences and a pattern of scientific discovery that is postulated to be general enough to
serve as an instructional framework to improve student reasoning and argumentative skills.
Identifying basic inferences and such a pattern within varied contexts should help teachers
and curriculum developers design and teach lessons that will help students construct better
understanding of how science works, thus help them become scientifically literate. The
examples may also help researchers improve the quality of their own research.
We should note at the outset that the present view departs somewhat from the view of ar-
gumentation advanced by philosopher Stephen Toulmin (1969) and emphasized by science
educators such as Newton, Driver, and Osborne (1999) and Erduran, Simon, and Osborne
(2004) in that it sees the primary role of argumentation, not as one of convincing others of
one’s point of view (although that is certainly part of the story) but rather as one of discov-
ering which of several possible explanations for a particular puzzling observation should
be accepted and which should be rejected. Thus, instead of Toulmin’s claims, warrants,
and backings, at the heart of the present theory lies multiple possible explanations, pre-
dictions, and evidence designed to either support or contradict each proposed explanation.
Indeed, the present view suggests that the best argument considers all of the alternatives
and explicitly includes the relevant evidence and reasoning supporting and/or contradicting
each.
We begin by analyzing the reasoning presumably involved in Galileo Galilei’s (1564–
1642) discovery of Jupiter’s moons in 1610 (Galilei, 1610, as translated and reprinted in
Shapley, Rapport, & Wright, 1954, and as initially interpreted by Lawson, 2002a). Galileo’s
discovery is analyzed in terms of the inference of abduction, as defined in the present paper,
and the inferences of retroduction, deduction, and induction as defined by Charles Sanders
Peirce (1839–1914). Peirce was an American philosopher, logician, and mathematician. In
the words of Misak (2004), “His work is staggering in its breadth. . . . But because of the
scattered nature of his work and because he was always out of the academic mainstream,
many of his contributions are just now coming to light” (p. 1).
Peirce is generally credited as the originator of pragmatism, a philosophical view op-
posing logical positivism and favored by contemporaries William James and John Dewey.
Pragmatism, with its roots in Darwinian evolutionary theory, argues against the existence
of absolute or transcendental truth and in favor of a more ecological account of knowledge
generation grounded in inquiry and in the testing and retention of ideas that work. Peirce’s
position most relevant to the task at hand is his view of how theory, observation, and
reasoning interact to test claims that have been advanced to explain puzzling observations.
Following the analysis of Galileo’s discovery, we turn to a similar analysis of Rose-
mary and Peter Grants’ monumental research on Darwin’s finches, which is then followed
by consideration of the Nobel Prize–winning research of Marshall Nirenberg. These case
histories, in turn, are followed by additional examples that explore the extent to which
the resulting pattern of reasoning and argumentation can be generalized to other scientific

Science Education
338 LAWSON

and nonscientific fields. The cases have been selected because they represent human rea-
soning and discovery across a broad and presumably representative range of disciplines
(i.e., astronomy, evolutionary biology, biochemistry, human history, geology, physics, and
engineering).

GALILEO’S DISCOVERY OF JUPITER’S MOONS


Initial Puzzling Observation and Abduction
In January 1610, Galileo had recently invented a new and improved telescope and had
begun using it to explore the “heavens.” During his initial telescopic exploration, Galileo
was puzzled by his observation of three tiny points of light near the planet Jupiter. Initially,
he generated the hypothesis that they were fixed stars (i.e., stars that lie in the celestial
sphere beyond Jupiter). Following Peirce, we refer to this spontaneous and creative act of
hypothesis generation as abduction because the puzzling observation is seen as similar to,
or analogous to, already explained observations that have been stored as part of declarative
knowledge, thus get “abducted/stolen/transferred” from that store to tentatively explain the
new observation—the points of light look like fixed stars so perhaps that is what they are.
Other terms that have been used to label the process of abduction are analogical inference,
analogical transfer, or analogical reasoning (e.g., Biela, 1993; Finke, Ward, & Smith, 1992;
Gentner, 1989; Giere, Bickle, & Mauldin, 2006; Holyoak, 2005; Koestler, 1964; Lawson
& Lawson, 1993; Sternberg & Davidson, 1995). It should be pointed out that in many
cases, the abductive transfer requires more insight than shown in Galileo’s case because the
“distance” between the analogous category and the target phenomenon is greater (e.g., the
idea of orbiting planets as an analogue for the structure of atoms, Watson and Crick’s image
of a spiral staircase as an analogue for the structure of DNA, Darwin’s use of artificial
selection as an analogue for natural selection, and Kekule’s image of snakes eating their
tails as an analogue for the benzene ring).
Importantly, abduction can be viewed as an inferential process in the sense that it involves
reasoning used to mentally derive causal claims (i.e., hypotheses/theories) from premises
(cf., Polya, 1954; Tidman & Kahane, 2003). For example, if . . . planets orbit the Sun, and
. . . atoms are like the solar system, then . . . perhaps electrons orbit an atomic nucleus. If
. . . captive plants and animals have changed because of artificial selection, and . . . a process
analogous to artificial selection occurred in nature, then . . . perhaps wild organisms have
evolved due to a process of “natural” selection. If . . . homing pigeons navigate home using
the Earth’s magnetic field, and . . . salmon can sense that magnetic field, then . . . perhaps
they use it to return to their home streams.

Using Retroduction for an Initial Test


Once a hypothesis has been generated via abduction, it must pass its first inferential
test, which Peirce called retroduction (note that Peirce did not conceptualize abduction and
retroduction as different and distinct inferences; thus, he used the terms interchangeably)
and described as follows:

A puzzling observation C is made. However, if . . . A were true, then . . . C would be a matter


of course.

Therefore . . . there is reason to believe that A is true. (Turrisi, 1903/1997, CP 5.168)

Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 339

And by philosopher Norwood Hanson (1958):

Before Peirce treated retroduction as an inference, logicians had recognized that the rea-
sonable proposal of an explanatory hypothesis was subject to certain conditions. The
hypothesis cannot be admitted, even as a tentative conjecture, unless it would account for
the phenomena posing the difficulty—or at least some of them. (p. 86)

And later by philosopher Carl Hempel (1966) like this:

When a hypothesis is designed to explain certain observed phenomena, it will of course


be so constructed that it implies their occurrence; hence, the fact to be explained will then
constitute confirmatory evidence for it. (p. 37)

Or more explicitly in Galileo’s case, like this:

If . . . the points of light near Jupiter are fixed stars, and . . . I observe the heavens near
Jupiter, then . . . I should see points of light that look like fixed stars. And . . . I do see points
of light that look somewhat like fixed stars. Therefore . . . they may in fact be fixed stars.

Thus, retroductive arguments follow an If/then/Therefore argumentative pattern. Al-


though retroduction is a crucial aspect of evaluating alternative explanations, it is a weak
test of a hypothesis in the sense that the least a hypothesis should do is to explain the
puzzling observation that led to its generation in the first place.1 Interestingly, Galileo’s
further retroductive reasoning led him to doubt his fixed-stars hypothesis. As he put it:

. . . although I believed them to belong to the number of the fixed stars, yet they made me
somewhat wonder, because they seemed to be arranged exactly in a straight line, parallel to
the ecliptic, and to be brighter than the rest of the stars, equal to them in magnitude. (p. 59)

When Galileo’s expressed doubt is cast in the If/then/Therefore form of the previous
retroductive argument, we get the following:

If . . . the three points of light near Jupiter are fixed stars, and . . . their sizes, brightness and
positions are compared to each other and to nearby fixed stars, then . . . variations in size,
brightness and position should be random, as is the case for other fixed stars. But . . . “they
seem to be arranged exactly in a straight line, parallel to the ecliptic, and to be brighter than
the rest of the stars.” Therefore . . . the fixed-stars hypothesis is contradicted. Or as Galileo
put it, “yet they made me wonder somewhat.”

Using Deduction to Generate Predictions


Consequently, Galileo went back to his store of declarative knowledge to abductively
generate another hypothesis. Perhaps, thought Galileo, the points of light are moons orbiting
Jupiter—like the moon that orbits Earth, or like the planets that orbit the Sun. Presumably
after using retroduction to convince himself that his orbiting-moons hypothesis would in
1
Although retroduction may be a weak test of a hypothesis, it can become a stronger test if the
hypothesis can retroductively explain several aspects of the puzzling observation. For example, Galileo’s
orbiting-moons hypothesis could retroductively explain (1) why the three points of light seemed to be
arranged exactly in a straight line, (2) why they were parallel to the ecliptic, and (3) why they differed in
brightness from the fixed stars.

Science Education
340 LAWSON

fact explain the puzzling observation, he then sought a way to construct a more convincing
test. That more convincing test required generating one or more predictions about future
observations that should occur provided that his new hypothesis is correct. Peirce referred
to this inference as deduction, which he described as follows:

Abduction [retroduction] having suggested a theory, we employ deduction to deduce from


that ideal theory a promiscuous variety of consequences to the effect that if we perform
certain acts, we shall find ourselves confronted with certain experiences. We then proceed
to try these experiments, and if the predictions of the theory are verified, we have a
proportionate confidence that the experiments that remain to be tried will confirm the
theory. (Bergman & Paavola, 1905/2003a, CP 8.209)

Thus, using deduction,2 Galileo generated an argument that led from his new hypothesis
to two future predictions, that is,

If . . . the three points of light are moons orbiting Jupiter (orbiting-moons hypothesis), and
. . . I observe them over the next several nights (planned test), then . . . some nights they
should appear to the east of Jupiter and some nights they should appear to the west. Further,
they should always appear along a straight line on either side of Jupiter (predictions).

Making the Necessary Observations


After presumably deriving these two predictions via deduction, Galileo remarked, “I
therefore waited for the next night with the most intense longing, but I was disappointed
of my hope, for the sky was covered with clouds in every direction” (p. 60). So because of
cloud cover, Galileo was unable to make the necessary observations to compare with his
deductively derived predictions. Fortunately, the next night and several subsequent nights
were clear and sure enough, the points of light appeared just as Galileo’s orbiting-moons
hypothesis led him to predict. In Galileo’s (1610) words,

I, therefore, concluded, and decided unhesitatingly, that there are three stars in the heavens
moving about Jupiter, as Venus and Mercury round the sun. . . . These observations also
established that there are not only three, but four, erratic sidereal bodies performing their
revolutions round Jupiter. . . . These are my observations upon the four Medicean planets,
recently discovered for the first time by me. (pp. 60 – 61)

Using Induction to Draw a Conclusion


Consequently, we can complete the previous deductive argument with Galileo’s observed
results and conclusion like this:

And . . . some nights they appeared to the east of Jupiter and some nights they appeared
to the west. Further, they always appeared along a straight line on either side of Jupiter
(observed results). Therefore . . . the orbiting- moons hypothesis is supported (conclusion).

2
Like abduction, deduction depends on connections with declarative knowledge. Declarative knowledge
is needed for the thinker to know what is implied by any particular hypothesis in question. Thus, one should
not view deduction as taking place in a strictly “logical” or “necessary” fashion or as a process resulting
in certainty. In this sense, both retroduction and deduction are dependent upon specifics related to the
situation at hand (i.e., what one might call disciplinary or declarative knowledge). For further explication,
see Lawson (2006b). Another key point here is that the thinker is generating predictions about outcomes
that the thinker has yet to observe. But this does not mean that the observations may not have already been
made by someone.

Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 341

According to Peirce, this final inference, which he called induction, was used to draw
this conclusion (Turrisi, 1903/1997). More generally,

If . . . the predicted and observed results match, like they do in this case, then . . . the
hypothesis is supported. On the other hand, if . . . the predicted and observed results do not
match, then . . . the hypothesis would have been contradicted.

Although Peirce referred to this inference as induction, it is not the form of induction
that some have claimed generates general conclusions from limited cases (e.g., this crow is
black, so is this one, and so on—therefore all crows are black)—a form of “enumerative”
induction that people probably do not use (e.g., Lawson, 2005; Popper, 1965). Rather,
enumerative induction can at best suggest descriptive claims in need of deductive test (e.g.,
all of the crows I have seen are black; thus, perhaps, all crows are black. If . . . all crows are
black, and . . . this new bird is a crow, then . . . I deduce/predict that it will also be black).
The form of induction that Galileo presumably used can be characterized as an inference
that leads to increased confidence in one’s conclusions with each additional supporting or
contradicting result. In Peirce’s words,

If that supposition be correct, a certain sensible result is to be expected under certain


circumstances which can be created, or at any rate are to be met with. The question is,
Will this be the result? If Nature replies “No!” the experimenter has gained an important
piece of knowledge. If Nature says “Yes,” the experimenter’s ideas remain just as they
were—only somewhat more deeply engrained. If Nature says “Yes” to the first twenty
questions, although they were so devised as to render that answer as surprising as possible,
the experimenter will be confident that he is on the right track, since 2 to the 20th power
exceeds a million. (Turrisi, 1903/1997, CP 5.168)

Note, however, that consistent with Peirce’s underlying pragmatism, this view of knowl-
edge generation falls short of certainty. One cannot be certain that an explanation is correct
because explanation generation is the product of human imagination and any number of al-
ternatives may lead to the same prediction. Hence, the subsequent observation of a specific
predicted result can, in theory at least, be taken as evidence for more than one explanation.
Likewise, one cannot be certain that a contradicted explanation is in fact wrong because a
mismatch between a prediction and an observation may not be due to a faulty explanation.
It may be due to a faulty test and/or to a faulty deduction. Also as pointed out by authors
such as Brannigan (1981) and Collins (1985), scientific claims are generated and evaluated
within social and cultural contexts that play a role in their acceptance or rejection. Recall
the words of Charles Darwin written in the concluding chapter of The Origin of Species
(initially published in 1859):

Although I am fully convinced of the truth of the views given in this volume under the form
of an abstract, I, by no means expect to convince experienced naturalists whose minds are
stocked with a multitude of facts all viewed during a long course of years, from a point
of view directly opposite to mine. . . . but I look with confidence to the future, -to young
and rising naturalists, who will be able to view both sides of the question with impartiality.
(1898 edition, pp. 294 – 295)

Similarly, in his autobiography, the physicist Max Planck (1949) wrote: “A scientific
truth does not triumph by convincing its opponents and making them see the light, but
rather because its opponents eventually die, and a new generation grows up that is familiar
with it.”
Science Education
342 LAWSON

Figure 1. A model of the elements of If/then/Therefore reasoning and argumentation used during the generation
and subsequent test of proposed explanations. Arguments are retroductive when results have been obtained by the
thinker before hypothesis and prediction generation and deductive when results are obtained after.

Summary of Galileo’s Reasoning in Terms of Abduction, Retroduction,


Deduction, and Induction
To summarize, we can identify four basic inferences and a pattern of scientific reasoning
and argumentation, as depicted in Figure 1 and in Table 1, as follows:

1. First, thanks to his new and improved telescope (the role of technology), Galileo
undertook a new exploration that led to a puzzling observation (the three unexplained
points of light near Jupiter).
2. Then, thanks to his prior store of declarative knowledge, Galileo used abduction to
generate a hypothesis (a tentative explanation) for the points of light (i.e., perhaps
they are fixed stars).
3. Next, Galileo used retroduction to subconsciously test his fixed-stars hypothesis,
which led to some doubt and then to rejection.
4. Then he once again used abduction to generate another hypothesis (the orbiting-
moons hypothesis), which when presumably checked by retroduction was supported.
5. He then used deduction to generate future predictions, which also require connections
in declarative knowledge.
6. Subsequently, after the cloud cover dissipated he made the necessary observations,
which matched his predictions.
7. Finally, on the basis of this match, he used induction to draw the conclusion that his
orbiting-moons hypothesis had been supported. Therefore, he was able to proudly
proclaim to the world that he was the first to discover “four, erratic sidereal bodies
performing their revolutions round Jupiter.”

Viewed in this way, scientific reasoning and discovery consist of undertaking novel
explorations that lead to puzzling observations that are subsequently explained by the
Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 343

TABLE 1
Basic Inferences of Scientific Reasoning, Argumentation, and Discovery
Inference Question Example
Abduction What caused the If . . . points of light seen in the night sky are
puzzling observation caused by fixed stars embedded in the
(e.g., the three new celestial sphere, and . . . three new similar
points of light near looking points of light are seen in the night
Jupiter)? sky, then . . . perhaps they also are fixed
stars.
Retroduction Does the proposed If . . . the points of light are fixed stars, and
cause explain what we . . . their positions are compared to each
already know? other, then . . . their positions should be
random. But . . . they appear exactly in a
straight line parallel to the ecliptic.
Therefore . . . perhaps they are not fixed
stars.
Deduction What does the proposed If . . . the three points of light are moons
cause lead us to orbiting Jupiter, and . . . I observe them
predict about future over the next several nights, then . . . some
observations? nights they should appear to the east of
Jupiter and some nights they should
appear to the west. Further, they should
appear along a straight line on either side
of Jupiter.
Induction How do the predictions If . . . the new observations match the
and new observations predictions based on the orbiting-moons
compare? hypothesis, as they do in this case (e.g.,
some nights the lights appeared to the
east of Jupiter and some nights they
appeared to the west), then . . . the
hypothesis is supported.

cyclic and repeated use of abduction, retroduction, deduction, observation, and induction.
Again in Peirce’s words,

Abduction [retroduction] furnishes all our ideas concerning real things, beyond what are
given in perception, but is mere conjecture, without probative force. Deduction is certain but
relates only to ideal objects. Induction gives us the only approach to certainty concerning
the real that we can have. In forty years diligent study of arguments, I have never found
one which did not consist of these elements. (Bergman & Paavola, 1905/2003a, CP 8.209)

We next consider Rosemary and Peter Grants’ monumental research on Darwin’s finches
of the Galapagos Islands to see if we can identify the same inferences and pattern of
reasoning and argumentation in biological discovery.

ROSEMARY AND PETER GRANTS’ RESEARCH ON DARWIN’S


FINCHES
According to Allchin (2006): “Scientists follow many methods: namely whatever works
or seems appropriate to the task at hand. Hence, Rosemary and Peter Grants’ work on
Science Education
344 LAWSON

Darwin’s finches—massive data collection done without any explicit hypothesis (as one
notable case) has nonetheless led to significant and widely respected claims” (p. 118).
Is Allchin’s characterization of the Grants’ work correct? If so, then their research
certainly would not fit the previous pattern. However, in characterizing the Grants’ research,
Allchin failed to cite any of their original accounts. Indeed, when one does consult what
the Grants say they did and when they did it, a very different picture emerges (e.g., Grant,
1986; Grant & Grant, 1989; P. R. Grant, personal communication, April 4, 2006). Thus,
let us take a close look at just what the Grants had to say about their research (also see
Lawson, 2009a). First, consider Peter Grant’s comments in the Preface of his 1986 book
Ecology and Evolution of Darwin’s Finches:

I chose to study the finches for two quite different reasons. The first arose from a confusion
about the significance of population variation. . . . The second reason sprang from a similar
confusion concerning inter-specific competition. . . . Since, the classical case of character
displacement was invalid (Grant 1972b, 1975a), it was logical to turn attention to the
classical case of character release. The classical case involves two species of Darwin’s
Finches on the Galapagos Islands (Brown & Wilson 1956). Despite having two good
reasons for studying the finches, I might never have begun research on them without the
stimulus of a proposal from a prospective postdoctoral Fellow, Ian Abbott. He had developed
a plan for detecting the effects of inter-specific competition among Darwin’s Finches. . . .
We prepared a research proposal and sought financial support. (pp. xi – xii)

Also consider these two quotes from the Preface and opening chapter of Rosemary and
Peter Grants’ follow-up book Evolutionary Dynamics of a Natural Population: The Large
Cactus Finch of the Galapagos (Grant & Grant, 1989):

Genetic variation in quantitative characteristics is the raw material for much of evolution.
A substantial body of theoretical work deals with the maintenance and significance of such
genetic variation. Field studies of the subject have been largely neglected, yet such studies
that employ a theoretical framework can be immensely valuable. (p. xvii)
The theoretical framework sets the scope of the study and helps us to identify major factors
in need of measurement. (p. 11)

These quotes should make it clear that the Grants’ data collection was directed and
preceded by a theoretical framework—specifically evolutionary theory and the classical
case of character release. Also consider this passage from Peter Grant (1986), a passage
that provides the general sequence of their reasoning and research:

Testing the competition hypothesis is difficult, for two reasons. First, the hypothesis deals
with the past. Since we cannot reconstruct those events precisely, we cannot test the
hypothesis directly . . . instead it must be tested through its consequences (predictions). . . .
To put the arguments into a testable framework we must rephrase them, along the following
lines. The observations to be explained are the distributions of species and the inter-island
differences in beak size and shape; the hypothesis is that distribution and morphology
were causally influenced by inter-specific competition for food; the main assumption upon
which the hypothesis rests is that the feeding niche of a population is reflected in, and hence
adequately indexed by, the average beak characteristics. . . . I shall now give two examples
of an examination of the hypothesis through a test of its predictions. . . . We should expect
that G. conirostris on Espanola, with mean beak characteristics intermediate between those
of the absent G. magnirostris, G. fortis, and G. scandens, has an intermediate feeding niche
position too. Not only that, it is expected to combine the niches of the three missing species,
and consequently its niche should be particularly broad. These are falsifiable predictions
Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 345

because they are not necessarily true3 . . . data to test these predictions were collected in the
early and middle dry seasons . . . in 1973 – 1979. . . . The predictions were supported by the
results. (p. 301)

Accordingly, we can summarize the Grants’ research in terms of the four inferences
previously used to characterize Galileo’s research. A time sequence is also included to
document the order of events and to make clear that, contrary to Allchin’s claim, the
generation of explicit hypotheses and the deductive derivation of predictions preceded their
“massive” data collection.

Puzzling Observation and Causal Question


A puzzling observation that confronted 20th century biologists was the distribution of
finch species on the Galapagos Islands and their interisland differences in morphology.
More specifically, what caused (in terms of evolutionary theory) the distributions and the
interisland morphological differences, such as the present-day intermediate beak size, of G.
conirostris? David Lack had raised this general causal question in the literature as early as
1947, and according to Peter Grant (1986), it “. . . had, by 1971, not lost any of its freshness”
(p. xi).

Abduction
The evolutionary-based hypothesis put forth to account for the distributions and mor-
phological differences was that they were caused by inter-specific competition for food.
Peter Grant discussed this hypothesis with respect to G. conirostris in a paper published
in 1972 (see Grant, 1986, p. xii). Thus, by 1972, the hypothesis must have been part of
Peter Grant’s declarative knowledge, thus must have been previously generated, perhaps in
response to his reading of the Brown and Wilson paper that discussed the classical case of
character release involving two similar species of Darwin’s finches. Thus, the hypothesis
was then abductively generated (e.g., If . . . the characteristics of the Darwin’s finches stud-
ied by Brown and Wilson were caused by character release, then . . . perhaps the interisland
morphological differences of G. conirostris were similarly caused by character release).

Retroduction
This character-release (i.e., inter-specific competition) hypothesis could then be retro-
ductively tested with an argument that would look something like this:

3
Peter Grant’s use of the term falsifiable should not be seen as adoption of the view (sometimes and
probably mistakenly attributed to Karl Popper) that science progresses only via the falsification (disproof)
of explanatory hypotheses. Upon reflection, it should be clear that a scientist with a novel explanation for
some puzzling observation does not want to falsify his or her explanation (e.g., Woodward & Goodstein,
1996). However, as Grant states, the approach does oblige the scientist to derive and conduct tests that
could in principle contradict the hypothesis in question. One should say contradict, but not falsify, because,
as mentioned, the source of a mismatch between predicted results and observed results might not be due
to a faulty hypothesis. Instead, it might be due to a faulty test and/or a faulty deduction. Nevertheless,
scientists must be willing and able to test their proposed explanations by planning tests that deductively
yield predicted results that may in fact not occur, thus potentially contradict their explanations. For example,
in Galileo’s case, had he not observed, on subsequent nights, the points of light to the east and then to the
west of Jupiter, as predicted, his observations would have contradicted (i.e., “falsified”) his orbiting-moons
hypothesis.

Science Education
346 LAWSON

If . . . inter-specific competition during the past with G. magnirostris, G. fortis, and G.


scandens caused the present-day intermediate beak size of G. conirostris (hypothesis),
and . . . one examines the beak size of G. conirostris on the island of Espanola where
G. magnirostris, G. fortis, and G. scandens are now missing (imagined test), then . . . G.
conirostris should have an intermediate beak size (prediction). And . . . G. conirostris does
have an intermediate beak size (past observation). Therefore . . . retroductive support exists
for the hypothesis.

Deduction
As mentioned, one should not simply retroductively test hypotheses. One should also
deductively derive predictions that can then direct the collection of future relevant data.
Accordingly, to further and convincingly test the inter-specific competition hypothesis, the
following deductive argument was generated:

If . . . inter-specific competition during the past with G. magnirostris, G. fortis, and G.


scandens caused the present-day intermediate beak size of G. conirostris (hypothesis), and
. . . one examines the beak characteristics and feeding niche of G. conirostris on the island
of Espanola where G. magnirostris, G. fortis, and G. scandens are now missing (imagined
test), then . . . in addition to its beak characteristics intermediate between those of its now
missing competitors, G. conirostris should also have an intermediate feeding niche position
and a particularly broad niche combining those of its three missing competitors (deduced
predictions).

Data Collection and Induction


Armed with these predictions, the Grants then sought funds from the National Science
Foundation. Funds were obtained and their first trip to the Galapagos Islands took place
in 1973 (Grant, 1986, p. xii). The relevant data were then collected from 1973 through
1979. Subsequent data analysis indicated that, as predicted, G. conirostris does have an
intermediate and particularly broad feeding niche. Because these observed results matched
the predicted results, the Grants then presumably used induction to conclude that the
inter-specific competition hypothesis had been supported, that is: If . . . the predicted and
observed results match, like they do in this case, then . . . the hypothesis is supported.
Therefore, following Peirce’s lead, we have identified the inferences of abduction, retro-
duction, deduction, and induction in the Grants’ research—research that can accordingly
be summarized in terms of the same If/then/Therefore pattern of reasoning and argumen-
tation that we found in Galileo’s discovery of Jupiter’s moons. We next turn to Marshall
Nirenberg’s Nobel Prize–winning biochemical research conducted during the early 1960s
to see whether it employed the same elements and followed the same argumentative pattern.

THE NOBEL PRIZE–WINNING RESEARCH OF MARSHALL NIRENBERG


Bonner (2005) argued for the existence of at least two scientific methods, which he
referred to as Method A and Method B.4 Bonner described Method A as one in which
puzzling observations provoke hypothesis generation, which then guide the selection and
4
One should not interpret Bonner’s use of the term method to imply a set of steps that ensures success.
Because creativity is involved in doing science, both in terms of generating interesting explanations and in
figuring out ways to test them, it is preferable to interpret the word “method” as a general plan of what
needs to be done. In a sense, having a plan helps. Even though you may not know exactly what to do in
each specific case, at least you consciously have a plan of what you should do. And if you do not have a
plan, you cannot play the game, at least not very well (e.g., Platt, 1964).

Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 347

planning of future deductive tests, which are followed by the gathering and analysis of
data. When using Bonner’s Method B, however, hypotheses are generated only after the
collection of data. The hypotheses then serve to explain the already gathered data.
In support of the existence and usefulness of Method B, Bonner cites the Nobel Prize–
winning research of Marshall Nirenberg conducted during the early 1960s. According
to Bonner, Nirenberg’s research followed Method B and asked this descriptive question:
“What amino acid does UUU code for?” At the time, biologists thought that the DNA code
consisted of four letters (adenine, A; guanine, G; cytosine, C; and thymine, T). They also
suspected that the DNA code was first translated into an RNA code, also with four letters,
but with uracil (U) substituting for thymine (T). Hence, an RNA code consisting of combi-
nations of As, Gs, Cs, and Us somehow coded for the production of proteins by somehow
stringing the 20 some amino acids together. So according to Bonner’s interpretation of
Nirenberg’s research, there could have been any 1 of 20 answers to his descriptive question
(e.g., UUU codes for serine, UUU codes for valine, UUU codes for phenylalanine).
In Bonner’s view, Nirenberg harbored no hypotheses and advanced no predictions about
which amino acid would be produced. Nirenberg simply wanted to know which of the 20
amino acids UUU codes. In other words, the fact that it turned out to be phenylalanine
was just the way it turned out and was no more or less theoretically significant than UUU
coding for valine, serine, or any other of the 20 some possibilities.
Based on Bonner’s view of Nirenberg’s research, it is surprising to learn how others at
the time responded when they learned of Nirenberg’s phenylalanine result. For example,
consider this response by Frances Crick contained in a paper published in Nature (Crick,
Barnett, Brenner, & Watts-Tobin, 1962):

At the recent Biochemical Congress in Moscow, the audience of Symposium I was startled
by the announcement of Nirenberg that he and Matthaei had produced polyphenylalanine
(that is a polypeptide all the residues of which are phenylalanine by adding polyuridic acid,
that is, an RNA the bases of which are all uracil) to a cell-free system which can synthesize
proteins. (p. 1232)

One has to wonder why the audience was “startled” to learn that a string of Us codes
for phenylalanine and not for say valine or serine. Perhaps, there is more to the story than
Bonner is acknowledging. Also consider Crick’s comment in a letter to Nirenberg dated
January 4, 1962: “The English papers have made rather a fuss about our Nature paper,
which was published on Saturday, but as far as I have stressed that it is your discovery
which was the real break-through.”
Crick’s breakthrough sentiment about Nirenberg’s research was echoed in two other
letters to Nirenberg. One letter from the famous French researcher Francois Jacob dated
December 20, 1961, had this to say: “Many thanks for your two manuscripts. It is a
wonderful story. All my congratulations.” The other letter from H. J. Muller of Indiana
University dated February 1, 1962, stated,

Let me express the thanks and appreciation of the Committee that arranged the recent
symposium on RNA coding for your kindness in having come here for the truly remarkable
contribution that you have made. It was inspiring to the older and to the younger hearers
alike to follow the course of the marvelous break-through that you described to us. (All
letters are online at http://profiles.nlm.nih.gov/)

Nirenberg’s colleagues were not the only ones startled and impressed by his “wonderful
story”—his “marvelous breakthrough.” The newspapers were also lauding Nirenberg’s
Science Education
348 LAWSON

achievement. Importantly, they placed it in the larger theoretical context of the day. Consider,
for example, the following paragraphs written in an article, titled “NIH Researchers Crack
the Genetic Code,” published in the Medical World News (January 5, 1962):

The enigma of genetic coding, considered a fundamental secret of life, may be on the verge
of solution. In just-published and about-to-be published papers, several research teams are
reporting experimental proof of what has been largely theory: the intricate process by which
structure and function of living organisms are shaped. One group has begun to crack the
DNA-RNA code—the key to the whole mystery. Soon they expect to decipher the entire set
of instructions by which genetic messengers direct the manufacture of proteins—the basic
stuff of life.

The major achievement in RNA research is the work of two young biochemists at the
National Institute of Arthritis and Metabolic Diseases, Drs. Marshall W. Nirenberg and J.
Heinrich Matthaei. Behind their work, however, is a whole series of investigations which
has produced the basic theory and its preliminary experimental support.

Fundamentally, the theory states that the hereditary “blueprints” of the cell structure and
function are coded within the cell nucleus as long-chain molecules of deoxyribonucleic
acid (DNA). These plans are transmitted, in a series of steps, to the cytoplasmic “assembly
line” where they direct the synthesis of each cell’s characteristic products. (p. 18, online at
http://profiles.nlm.nih.gov/)

If we assume that this is a relatively accurate account, then we can see why Nirenberg’s
result caused such a fuss. He not only answered Bonner’s narrow descriptive question
but also provided a key piece of evidence to help answer a much broader causal question,
namely: How does DNA code for the production of proteins? Importantly, by helping answer
this more fundamental theoretical question, Nirenberg had begun to “crack” the genetic
code—a breakthrough worthy of a Nobel Prize.
Thus, Bonner’s characterization of Nirenberg’s research as descriptive and exemplary of
Method B appears misleading. A more accurate interpretation is that Nirenberg was using
Method A. Consequently, his research can be better understood as a theory-driven attempt
to find out how the letters of DNA code for the production of proteins. To do so, Nirenberg
generated a theory claiming that (a) specific combinations of at least three of the four letters
of DNA first serve as a template for the production of RNA; (b) specific combinations of at
least three of the four letters of RNA then serve as a template for sequencing specific amino
acids; and (c) amino acids when strung together make proteins. Accordingly, Nirenberg’s
reasoning and his key deductive and inductive argument can be summarized similarly to
the previous cases of Galileo and the Grants, that is,

If . . . the above theory is correct, and . . . we conduct an experiment with RNA made only
of U’s (imagined test), then . . . a polypeptide molecule should be synthesized and it should
consist of only one type amino acid (predicted result via deduction). And . . . when Nirenberg
and Matthaei (1961) conducted the test, they found that a polypeptide chain consisting of
only one type amino acid (i.e., phenylalanine) was produced (observed result). Therefore
. . . support had been found for the theory5 (conclusion via induction).

5
When this If/then/Therefore characterization was read to Nirenberg during a telephone conversation,
he replied: “That’s exactly right.” Also consistent with Nirenberg’s use of Method A and his goal of theory
testing, he said that at the time he did not even know whether the message came from DNA or from RNA,
or for that matter if mRNA even existed (M. W. Nirenberg, personal communication, December 2005).

Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 349

Of course, additional questions remained. Is the code a triplet code? Is it nonoverlapping?


Is it degenerate? Nevertheless, presumably thanks to the use of Method A, the “marvelous
breakthrough” had been made. The basic theory had been supported and the genetic code
was beginning to crack.

A Closer Look at Bonner’s Method B


Interestingly, in terms of inferences, Bonner’s Method B appears to involve the use of
retroduction in the sense that hypotheses are generated to explain the some puzzling aspect
of previously gathered data. This interpretation implies that Method B is really only part
of the process. As mentioned, following a successful use of retroduction, a more difficult
and more convincing test needs to be conducted in which the hypothesis is used to deduce
a prediction(s) about how some future test(s) should turn out.
Another problem with Method B is its apparent weakness in terms of deciding what data
to gather at the outset. Without some theory or hypothesis to guide one’s search, how is one
supposed to know what data to collect or what experiment to conduct? Philosopher Carl
Hempel (1966) put it this way:

In sum, the maxim that data should be gathered without guidance by antecedent hypotheses
about the connections among the facts under study is self-defeating, and is certainly not
followed in scientific inquiry. On the contrary, tentative hypotheses are needed to give
direction to scientific investigation. (p. 13)

Similarly, philosophers Theodore Schick and Lewis Vaughn (1995) commented,

A moment’s reflection reveals that data collection in the absence of a hypothesis has little
or no scientific value. Suppose, for example, that one day you decide to become a scientist
and having read a standard account of the scientific method you decide to collect some
data. Where should you begin? Should you start by cataloging all the items in your room,
measuring them, weighing them. . . ? Clearly there’s enough data in your room to keep you
busy for the rest of your life. (p. 191)

More recently, however, Mahootian and Eastman (in press) argue that the volume of ob-
servational data and the power of high-performance computing have increased by several
orders of magnitude and have reshaped the practice of science much in the way of Bonner’s
Method B. They advance what they call an observational-inductive (OI) approach to de-
scribe that new practice and to complement what they call the old hypothetico-deductive
(HD) approach. For example, one could now measure say 100 different variables and
use a high-powered computer to virtually instantaneously calculate correlation coefficients
among all 100 variables. Then without any prior hypotheses, one could sift through the
resulting coefficients to see which ones are relatively large (e.g., ≥0.80). Then upon finding
any such large coefficients, one could generate hypotheses to tentatively explain them, then
deduce predictions, and so on.
Of course, in theory, one could do this. But we can hear Hempel, Schick, and Vaughn
ask: Why would our imaginary scientist choose those 100 variables and not some other
100? Are there not prior conceptions (i.e., prior hypotheses/theories) involved in knowing
which variables to select and which to omit? If true, then Mahootian and Eastman are
not advancing a fundamentally different approach. Rather our imaginary scientist is still
using prior hypotheses/theories, albeit perhaps on a subconscious plane, to select which
variables to pay attention to and which ones to ignore. When the resulting coefficients are
Science Education
350 LAWSON

then calculated and observed, they may fit those prior conceptions or they may not. If they
do not, then the scientist has a new puzzling observation in need of explanation, which of
course would need to be tested via retroduction, deduction, and induction.
Alternatively, it is possible to imagine someone randomly picking 100 variables with
no prior conceptions about those variables, then using a computer to calculate correlation
coefficients, and so on. But this is unlikely to advance our collect scientific knowledge—at
least not very quickly. This is not to say that the OI approach could not be used. It simply
means that if used completely devoid of any guidance on which variables are selected,
the approach is unlikely to be productive. After all, one cannot “stand on the shoulders of
giants” if one cannot find the giants or if one lacks a ladder to climb up on.
Perhaps, another point is in order at this time. We are arguing that science begins
with puzzling observations. Certainly, the encounter with a puzzling observation is not
consciously planned. Recall that Galileo was simply using his new telescope to take a
“random walk” around the “heavens.” His initial observations were not designed to test a
hypothesis or a theory. But this does not mean that he did not have prior conceptions about
what he might see. After all, why was his notice of the three points of light near Jupiter
puzzling in the first place? The answer is that his immediate assimilation of those points of
light into his “fixed-stars” conception retroductively led to a contradiction, that is,

If . . . the three points of light near Jupiter are fixed stars, and . . . their sizes, brightness and
positions are compared to each other and to nearby fixed stars, then . . . variations in size,
brightness and position should be random, as is the case for other fixed stars. But . . . “they
seem to be arranged exactly in a straight line, parallel to the ecliptic, and to be brighter than
the rest of the stars.” Therefore . . . the fixed stars hypothesis is contradicted. Or as Galileo
put it, “yet they made me wonder somewhat.”

So the point is that all observations are hypothesis/conception driven (i.e., theory laden).
Those that are not puzzling are simply those that match our expectations (our predictions),
whereas those that are puzzling do not match and may, if attended to, eventually result in a
change (an accommodation) in those conceptions via If/then/Therefore reasoning.

HOW GENERAL IS THE IF/THEN/THEREFORE PATTERN OF


REASONING AND ARGUMENTATION?
In a recent extensive review of argumentative frameworks, Sampson and Clark (2008)
classified frameworks as domain general (i.e., those used to analyze arguments inside or
outside the field of science such as Toulmin’s framework of claims, warrants, backings,
etc.) or domain specific (i.e., those that focus on aspects of arguments specific to science
or subfields such Zohar and Nemet’s framework, which focuses heavily on content and
justification, or Kelly and Takao’s framework, which focuses on epistemic levels of specific
propositions). Interestingly, Sampson and Clark classified the present If/then/Therefore
framework (as discussed in Lawson, 2003) as domain specific. In their words,

It fits the traditional empirical model of hypothesis testing and therefore might apply less
well, for example, in terms of science conducted with archival data sets or observational
contexts such as certain subfields of geology. As a result, the framework is very specific in
terms of the scientific disciplines and contexts to which it applies, but for these disciplines
and contexts it provides a strong structural model to guide instruction and student reasoning.
(pp. 460 – 461)
Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 351

Is the If/then/Therefore framework really domain specific? Certainly, as we have seen


in Galileo’s case, it can apply to situations in which circumstantial evidence is used to
test hypotheses—where circumstantial evidence is defined as circumstances that according
to common experience are usually linked to the hypothesized cause (e.g., as predicted on
the basis of the orbiting-moons hypothesis, some nights the lights appeared to the east of
Jupiter and some nights they appeared to the west. Furthermore, they always appeared along
a straight line on either side of Jupiter). Indeed, the Grants’ data were also circumstantial
in nature (e.g., G. conirostris did have an intermediate and particularly broad feeding
niche). And as we have seen in the Nirenberg case, the If/then/Therefore framework can
apply to situations in which scientists use experiments (i.e., manipulations of nature) to
test hypotheses. However, Sampson and Clark ask: What about “science conducted with
archival data sets” or “certain subfields of geology”? Can the framework apply here as well?
And can it be found outside of science? To find out let us consider additional examples
starting with historian Jared Diamond’s use of archival data to test hypotheses about the
path of human history.

Jared Diamond’s Use of Archival Data to Test Hypotheses About


Human History
In his Pulitzer Prize–winning book Guns, Germs, and Steel, Jared Diamond (1997) was
puzzled by the way history unfolded on different continents. More specifically: “Why did
wealth and power become distributed as they now are, rather than some other way? For
instance, why weren’t Native Americans, Africans, and Aboriginal Australians the ones
who decimated, subjugated, or exterminated Europeans and Asians” (p. 15)? Diamond
advanced two hypotheses to answer these causal questions.
The first hypothesis, the innate-intelligence hypothesis, claims that differences arose
because of differences in innate intelligence among the races—that is, some people are
innately smarter than others. Consequently, the smarter people developed the technology
and so forth that made it possible for them to dominate. Thus, when cultures came into
contact, the smarter, more technologically advanced people decimated, subjugated, or ex-
terminated the less intelligent people. Alternatively, the second hypothesis, the environment
hypothesis, claims that technological differences arose instead due to environmental differ-
ences. That is, the environment in which people settle dictates what sorts of technological
advances are possible, thus determines which group develops technology and dominates if
and when they meet.
To test these alternatives, Diamond described a “natural experiment” concerning the set-
tlement of Polynesia. Around 1200 BC, a group of people from the Bismarck Archipelago,
north of New Guinea, finally reached and began colonizing the enormously diverse islands
of Polynesia. By about AD 500, the colonization of the islands was mostly complete.
As Diamond put it, “The ultimate ancestors of all modern Polynesian populations shared
essentially the same culture, language, technology, and set of domesticated plants and an-
imals. Hence Polynesian history constitutes an ‘experiment’ allowing us to study human
adaptation. . . ” (p. 55). In other words, Diamond reasoned that because the environmentally
diverse islands were all settled by the same ancestral group, technological differences that
arose from island to island could not be attributed to ancestral differences in innate intelli-
gence because innate intelligence was a variable that had been historically held constant.
Instead, any differences that arose can be attributed to other variables such as the diverse
environments. Accordingly, here is an explicit If/then/Therefore argument that appears to
be behind Diamond’s use of these archival data to test the alternatives:
Science Education
352 LAWSON

If . . . the environment hypothesis is correct, and . . . a group of people from the Bismarck
Archipelago settle in the large and relatively favorable environment of New Zealand,
while another group settles on the small and considerably less favorable environment of the
Chatham Islands 500 miles to the west (planned test), then . . . technological advances should
proceed faster and more fully in New Zealand than on the Chatham Islands. Additionally,
and if and when the two groups come in contact, the New Zealanders should dominate
the Chatham Islanders (deduced predictions). And . . . during the centuries following the
settlement of New Zealand and the Chatham Islands, the two groups of settlers did in
fact develop in opposite directions. The New Zealanders developed complex technology,
political organization, and intense farming practices while the Chatham Islanders reverted
to a loosely-coordinated hunting and gathering society. Further, in December 1835 when
500 armed men from New Zealand arrived on the Chatham Islands, they quickly killed
or enslaved the Chatham Islanders in spite of the fact that they were vastly outnumbered
(archival data). Therefore . . . the environment hypothesis is supported. Further, the innate-
intelligence hypothesis is contradicted because the identical ancestry of both groups of
settlers predicts similar developmental paths (conclusion).

If this account is reasonably accurate, we can conclude that the If/then/Therefore argu-
mentative form is general enough to encompass the use of archival data. Note, however,
whether this argument or any other If/then/Therefore argument should be considered retro-
ductive or deductive depends on when the thinker became aware of the relevant archival
data. Suppose, for example, Diamond generated the following argument before he was
aware of the events of 1835:

If . . . the environment hypothesis is correct, and . . . a group of people from a single location
settle in a large and relatively favorable environment, while another group from the same
location settle on a small and considerably less favorable environment (planned test), then
. . . technological advances should proceed faster and more fully in the first group than in
the second and when the two groups come in contact the first group should dominate the
second (deduced predictions).

Suppose Diamond next sifted through archival data to see whether he could find a
specific case in point. If this was the order of things, then Diamond clearly used a deductive
approach. If, however, Diamond first became aware of the 1835 killing and enslavement
of the Chatham Islanders by the New Zealanders and only then employed the environment
hypothesis to explain that “puzzling observation,” his argument would be retroductive. If
so, he should then deduce similar results and look elsewhere in the archival record to see
whether he can find them.

Geological Discovery: What Killed the Dinosaurs?


Recall that Sampson and Clark (2008) concluded that the If/then/Therefore form of argu-
mentation is too specific to encompass all scientific contexts “. . . such as certain subfields
of geology” (p. 461). Thus, let us turn to the research of geologist Walter Alvarez to see
what sort of reasoning and forms of argumentation he used in drawing the conclusion that
a giant meteor and its aftermath killed the dinosaurs some 65 million years ago (also see
Lawson, 2004).
The rapid near extinction of forams found in rock strata during the early 1970s presented
a puzzling observation because it contradicted the longstanding uniformitarian doctrine
that geologic and biologic changes occur gradually. Consequently, Alvarez (1997) began
seeking a catastrophic cause. Alvarez was well aware of meteor impact craters on Earth, the
Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 353

best example being Meteor Crater in Northern Arizona, as well as the mounting evidence
that the craters covering our moon and other planets were caused by impacting asteroids
and comets. More important, such impact craters are the rule, not the exception. Alvarez
was also aware of two published papers proposing that the dinosaur extinction had been
caused by radiation triggered by the explosion of a nearby star—a supernova.
By 1976, Alvarez began focusing his attention on the KT boundary layer (i.e., the narrow
boundary between the Cretaceous and Tertiary layers). He suspected that the boundary held
the key to the dinosaur extinction and that it could be used to test (via deduction) the more
global uniformitarian versus catastrophic theories. As he put it: “Very rapid deposition of
the clay would suggest a sudden cause for the extinction, but slow deposition would suggest
a gradual mechanism.” (Alvarez, 1997, p. 61)
How then could he find out how long it had taken to deposit the clay? What he needed
was something that had been deposited in the limestone and clay at a constant rate. At this
point, Alvarez enlisted the expertise of his father Luis Alvarez, a physicist at Berkeley. The
elder Alvarez knew that although meteors hit the Earth rarely and at random, meteorite dust,
which contains iridium, falls from outer space at a constant rate across the entire Earth.
Therefore, they came up with a way to indirectly measure the clay’s deposition rate by
measuring the amount of iridium. In other words,

If . . . the extinction of many foram species, and possibly the dinosaurs, was caused by a
catastrophic event (catastrophic-event hypothesis), and . . . the amount of iridium contained
in the clay at the KT boundary layer is measured (imagined test), then . . . a relatively small
amount of iridium should be present—about 0.1 parts per billion (ppb) (predicted result via
deduction). Iridium falls at a constant rate, thus the less iridium in the layer, the less time it
must have taken for deposition. And . . . thanks to Berkeley chemist Frank Asaro, by June of
1978 the initial iridium measurements had been made and they contained another surprise.
Instead of the expected amount of 0.1 ppb, assuming the clay layer had been deposited
slowly, a value of 9 ppb was detected (observed result).
Therefore . . . either the extinction of many foram species, and possibly the dinosaurs, was
not caused by a catastrophic event (conclusion via induction); or perhaps the catastrophic
event itself deposited the unusually large amount of iridium (alternative hypothesis).

Consider Alvarez’s reaction to the huge value of detected iridium:

Where had all the iridium come from? Possibilities quickly sprang to mind: Could it have
come from the supernova that Dale Russell and Wallace Tucker had suggested to explain the
dinosaur extinction? Did it come from an impacting asteroid or comet? Or could there be a
non-catastrophic explanation? Maybe the iridium was deposited from seawater somehow.
Or maybe the Earth had encountered a cloud of interstellar dust and gas. (Alvarez, 1997,
p. 69)

Before investing time and energy in testing these possibilities (i.e., alternative hypothe-
ses), Alvarez needed to know whether the iridium anomaly was restricted to the clay bed
around Gubbio or whether it was a global phenomenon. So he went to the library in search
of other known KT sites. At that time, the only other known site was a seaside cliff called
Stevns Klint in Denmark. Thus, Alvarez set off to visit the Stevns Klint deposits. And on
the basis of the following deductive/inductive argument, he concluded that what he found
there supported the catastrophic-event hypothesis:

If . . . the unusually large amount of iridium in the Gubbio clay layer was caused by a
global catastrophic event (catastrophic-event hypothesis), and . . . the amount of iridium
is measured in the other known KT boundary layer at Stevns Klint (imagined test), then
Science Education
354 LAWSON

. . . an unusually high level of iridium should also be found in that layer (predicted result via
deduction). And . . . when Alvarez visited the Stevns Klint deposits, he found that they also
contained a narrow clay layer with an unusually high concentration of iridium (observed
result). Therefore . . . the hypothesis was supported (conclusion via induction) and Alvarez
decided that it was time to think about a global explanation for the anomaly.

Thus, on the basis of this interpretation, we can conclude that the If/then/Therefore
pattern of argumentation is general enough to encompass at least some geological research.
We leave it to others to search for other geological cases that may or may not apply. Next let
us briefly consider the so-called “thought experiments” (i.e., instances in which an entire
“experiment” is conducted in one’s mind) to see whether the same argumentative form
applies.

Thought Experiments
As the name implies, thought experiments take place in one’s thoughts. But this does
not mean that they do not include observed results. They do. But the results of thought
experiments have been observed before the experiment has been mentally conducted.
Consequently, the point of a thought experiment is often to reveal via retroduction that the
hypothesis in question must be wrong. It must be wrong because it leads either to a prediction
that does not match with what we have already observed or to contradictory predictions. In
this sense, thought experiments can also be cast in the form of If/then/Therefore arguments,
which provides additional evidence of the form’s generality.
For example, Galileo conceived of one of the most famous thought experiments in
science. He wondered whether Aristotle’s claim that heavier objects fall faster than lighter
objects was correct. (Actually, heavier objects typically do fall faster than lighter objects
in air, but Galileo’s thought experiment was conducted in an idealized world devoid of
fall-resisting air molecules.) As you will see, Galileo’s retroductive reasoning led to the
conclusion that the mass must not matter because if it does, we end up with contradictory
predictions.

If . . . the rate of fall depends on the mass of the object, and . . . we drop a large, heavy
rock next to a smaller, lighter rock, then . . . the larger, heavier rock should hit the ground
first. Further, if . . . the rate of fall depends on the mass of the object, and . . . we now tie
the two rocks together and drop them, then . . . the larger, heavier rock should fall faster. It
should fall faster than before because it is now more massive (prediction). However, when
the rocks are tied together and are falling, the lighter, slower falling rock will produce a
drag on the heavier rock and slow it down. This implies that when tied together the rocks
should fall more slowly (contradictory prediction). Therefore . . . we have two contradictory
predictions implying that the rate of fall must not depend on the mass of the falling objects.

Engineering and the Wright Brothers Invention of the Airplane


Samarapungavan et al. (2006) proposed that research chemists have adopted what they
call an “engineering” research model, as opposed to what they view as the more classic
“hypothetico-deductive theory building model” (p. 470). Although their characterization
of HD science shares little with the view advanced in this paper, their selection of an
engineering research model to characterize chemical research is of interest in the sense
that one might suspect that engineers employ the same If/then/Therefore reasoning pattern
during the invention process that we are arguing is used during scientific discovery. In other
words, in terms of reasoning, engineers and scientists may be doing the same thing. As
Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 355

a case in point, let us consider what happened in 1900 when bicycle builders Orville and
Wilber Wright tried their hand at building an airplane (as cited in Crouch, 1992).
The Wright brothers began by planning to build a small, unmanned glider. To calculate
the size that the glider’s wings would need to develop the necessary lift for flight, they used
an equation called the lift equation. According to the lift equation, the amount of lift created
(L) depends on the total area of the lifting surface (S), the velocity of the flight squared
divided by 2 (V 2 /2), a coefficient of air pressure (k), and a coefficient of lift (C L ), that is,

kSV 2
L=
2CL

After using the lift equation to calculate the necessary specifications, the brothers built
their glider to these specifications, and in October of 1900 took it to Kitty Hawk, North
Carolina, for testing. The test results were encouraging enough to motivate them to build a
larger manned glider and try it the next year. During that next year, the larger manned glider
made several test flights—the longest 389 ft. However, the tests were largely discouraging
because the manned glider failed to attain the needed lift for eventual self-propelled flight
by some 20%. Thus, a puzzling observation provoked a causal question, namely: Why
did not the manned glider attain the needed lift? Presumably, based on the following
If/then/Therefore reasoning, they concluded that their failure to achieve the necessary lift
was due to faulty specifications used in the lift equation, that is,

If . . . the specifications used in the lift equation are correct, and . . . we build and fly a larger
manned glider to those specifications and determine its amount of lift, then . . . it should
achieve the needed lift for eventual self-propelled flight. But . . . when the manned glider
was taken to Kitty Hawk and tested it did not attain the necessary lift—by some 20%.
Therefore . . . the specifications are probably not correct.

But what in the specifications contained the error? The Wright brothers hypothesized
that the error likely existed in the coefficients used in their calculations (i.e., in the air
pressure coefficient, in the lift coefficient, or in both). They then figured out a way to test
their hypotheses by using a moving bicycle with a spare wheel, free to turn, mounted on its
handlebars.
To deduce the necessary prediction for their test, they used the lift equation and the two
previously used coefficients to calculate that a wing with a surface area of 1 square foot, set
at a 5◦ angle, should precisely balance a flat plate measuring 0.66 of a square foot, set at a
90◦ angle to the air flow. Consequently, to conduct the test, they mounted the spare wheel
on the handlebars; they fixed the 1 square foot wing on the front of the spare wheel at a 5◦
angle; they fixed the 0.66 square foot flat plate on the spare wheel at a 90◦ angle to the air
flow; and they rode the bicycle with its spare wheel, wing, and flat plate down the street.
Based on their calculations, the forces created on the wing and on the flat plate should
precisely balance each other and the spare wheel should not turn. However, when they rode
down the street, the wheel turned. So the error could not be in the surface areas, the actual
lift, or in the velocity. Therefore, they could be reasonably sure that the coefficients used
in the calculations were in fact to blame. Their reasoning can be summarized using the
If/then/Therefore form like this:

If . . . no error exists in the two coefficients, and . . . a bicycle with the spare wheel, a wing,
and a flat plate mounted as described above is ridden down the street, then . . . the forces
exerted by the wind on the wing and on the flat plate should precisely balance and the spare
Science Education
356 LAWSON

wheel should not turn. But . . . when the bicycle was ridden down the street, the spare wheel
turned. Therefore . . . the hypothesis is contradicted. In other words, an error must exist in
the coefficients.

So the Wright brothers set out to find out which coefficient was to blame. To do this,
they built a wind tunnel and used a small airfoil mounted on a balance to conduct several
additional hypothesis-driven experiments that soon led to the construction of the first
successful airplane in 1903.

CONCLUSION AND IMPLICATIONS


This paper has argued that the core of scientific reasoning, argumentation, and discov-
ery consists of four inferences called abduction, retroduction, deduction, and induction.
Abduction is first used to generate possible explanations for puzzling observations. Next,
retroduction, deduction, and induction drive a pattern of If/then/Therefore reasoning used
in the service of testing these explanations. Evidence in the form of several case histories
has been presented that the inferences and the reasoning pattern are of general applicability.
Recall Peirce’s previously quoted words: “In forty years diligent study of arguments, I have
never found one which did not consist of these elements” (Bergman & Paavola, 1905/2003a,
CP 8.209). Indeed, the present paper is another case in point (e.g., If . . . the present theory
of the nature of scientific reasoning and argumentation is correct, and . . . several varied
cases of scientific discovery are carefully analyzed, then . . . they should reveal the use of
abduction, retroduction, deduction, and induction. And . . . several varied cases do reveal
use of these inferences. Therefore . . . the theory is supported).
Accordingly, science can be viewed as an enterprise in which explorations, which need
not be consciously preceded by prior hypotheses/theories, yield puzzling observations
in need of explanation. Scientists then subconsciously cull through their prior declarative
knowledge to abductively generate one or more tentative explanations. This can happen very
quickly or it can take several years. Either way, once generated, the tentative explanations are
initially and subconsciously put to a retroductive test. Does the explanation in fact explain
the initial puzzling observation? At passing such a test, one may incorrectly conclude that his
or her task is complete. But it is not. Now, new tests should be imagined that lead deductively
to predictions about possible new observations. Once such tests have been conducted
and the new observations made, they need to be compared with the predictions. A good
match inductively supports the tested explanation. A poor match inductively contradicts
the explanation. Importantly, as mentioned (see footnote 2), the underlying reasoning is not
strictly “logical” or “procedural” in the sense that prior declarative knowledge is needed,
not only as a source of hypotheses but also as a source of imagined tests and predictions.
John Platt (1964) expressed a similar view in his now classic paper “Strong Inference.”
In that paper, Platt defined strong inference like this:

Strong inference consists of applying the following steps to every problem in science,
formally and explicitly and regularly:

1. Devising alternative hypotheses;


2. Devising a crucial experiment (or several of them), with alternative possible outcomes. Each
of which will, as nearly as possible exclude one or more of the hypotheses;
3. Carrying out the experiment so as to get a clean result;
4. Recycling the procedure, making sub-hypotheses to refine the possibilities that remain; and
so on. (p. 347)
Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 357

Although Platt noted that some research fields collectively embrace these steps, he also
noted that the steps were neither universally understood nor applied. Again in his words,

The difference between the average scientist’s informal methods and the methods of strong-
inference users is somewhat like the difference between a gasoline engine that fires occa-
sionally and one that fires in steady sequence. If our automobile engines were as erratic as
our deliberate intellectual efforts, most of us would not get home for supper. (p. 347)

Therefore, although it would seem incorrect to argue that all scientific research is
consciously guided by cycles of abductive, retroductive, deductive, and inductive infer-
ences, it might, nevertheless, be argued that the odds of success would improve if they
were consciously applied. Doing so, however, would require that researchers become more
aware of their reasoning in successful instances so that they are better able to repeat this
successful reasoning in subsequent instances. In cognitive terms, the consciousness issue
appears to be one of “metacognition”—a term coined by Flavell (1979). Metacognition
literally means thinking about one’s thinking, thus refers to an individual’s ability to stand
apart of his or her own thinking, reflect on, and subsequently improve one’s thinking. Thus,
in cognitive terms, what Platt is calling for is more reflectivity, more metacognition, on
the part of researchers. Increased reflectivity would presumably lead to a greater aware-
ness/consciousness of the reasoning process so that they might waste less time gathering
irrelevant data and instead would more quickly move to the explicit generation and retro-
ductive and then deductive tests of alternative hypotheses and predictions. This view is
consistent with more recent so-called “dual-processing” accounts of reasoning and social
cognition, which posit the existence of cognitive processes that are fast, automatic, and
unconscious and those that are slow, deliberate, and conscious (e.g., Evans, 2008).
Platt’s argument for raising consciousness among scientists can be applied in the sci-
ence classroom as well. In short, students need to engage in more lessons in which they
have opportunities to explore nature and confront puzzling observations and the resulting
causal questions. They then need a skilled teacher who allows and encourages them to
generate and test alternatives hypotheses and then reflect on what they have done, thus
“exercise” and become more conscious of their nascent inferential skills. Many science
educators have previously expressed a similar view with various degrees of explicitness.
For example, Berland and Reiser (2009) recently explored the usefulness of a framework
proposed by McNeill and Krajcik (2007). The McNeill–Krajcik framework contains these
three components: (1) Claim—the answer to the question, the piece to be defended by
evidence and reasoning; (2) Evidence—information or data that supports the claim; and (3)
Reasoning—a justification that shows why the data count as evidence to support the claim.
The NcNeill–Krajcik framework has elements in common with the present framework.
However, it lacks some of the present framework’s explicitness and completeness.
Also consider the Predict-Observe-Explain (POE) framework proposed by White and
Gunstone (1992). During POE instruction students are first asked to predict the outcome
of some sort of exploration or manipulation and then asked to justify their prediction. This
is usually done in an area in which they are likely to generate a false prediction based
on a misconception. Students then make the relevant observation, usually of a discrepant
event that contradicts their prediction. Finally, they are asked to explain the discrepancy
in an effort to change their misconception. Viewed in terms of the present theory, we can
interpret a student’s justification, their misconception, as an alternative hypothesis that
deductively generated their previously stated prediction. Thus, the subsequent observation,
which does not match their prediction, contradicts their hypothesis and leads to the need to
generate an alternative hypothesis (an alternative conception) that retroductively generates
Science Education
358 LAWSON

a prediction that matches what they have just observed. The only elements missing from
this POE framework (albeit three very important ones) are the need for students to then
(1) plan some new tests of the alternatives based on deduction, (2) conduct the tests and
compare the test results with their deductively derived predictions, and (3) use induction to
conclude that the alternatives have been supported or contradicted, thus replace their prior
misconception with a more scientifically acceptable one.
Specifically, when put into practice, either in scientific research or in the science class-
room, the present framework distinguishes among an argument’s declarative elements (i.e.,
puzzling observations, causal questions, hypotheses, planned tests, predictions, conducted
tests, results, and conclusions) and its procedural elements (i.e., abduction, retroduction,
deduction, and induction). Furthermore, the present framework details how the declara-
tive and procedural elements interact in the following manner: (1) An exploration phase
occurs in which a puzzling observation is made; (2) a causal question is raised; (3) a cre-
ative brainstorming phase occurs in which multiple hypotheses are abductively generated;
(4) next, a phase occurs in which tests are planned that retroductively and later deductively
lead to explicitly stated predictions; (5) evidence is then gathered that at least “in theory”
might contradict each hypothesis; (6) predictions and evidence are compared to allow, via
induction, the drawing of a conclusion; and (7) oral and/or written arguments are prepared
and presented that include the evidence and the If/then/Therefore reasoning for and against
each of the hypotheses.
Unfortunately, in terms of implementing such lessons, a recent survey (Oehrtman &
Lawson, 2008) found that a majority of experienced high school science teachers (63%)
were unaware of the distinction between hypotheses and predictions. Perhaps, even worse,
41% of them failed to distinguish evidence from conclusions. Such lack of awareness
is also common in instructional materials. For example, in a published set high school
physical science lessons, Hsu (2005) defines a hypothesis as “A sentence describing what
you think your experiment should demonstrate” (p. 9). And in a series of published high
school general science lessons, Cothron, Giese, and Rezba (2006) offer this definition
and example: “A hypothesis is a prediction of the effect that changes in the independent
variable will have on the dependent variable. One possible hypothesis would be: If the
amount of salt in the water is increased, then the water will evaporate more slowly”
(p. 45). Failing to differentiate hypotheses from predictions in this way not only loses
the “logic” of hypothesis testing but also loses the central goal of doing science, which
is to generate and test explanations. Small wonder so many teachers and students are
perplexed.
Indeed, many, if not most, published lessons fail to begin with puzzling observations
in need of explanation. For example, who among us has not seen a lesson similar to one
recently sent to me by a curriculum developer from a nearby school district? The lesson
begins with students observing different types of birdseed. Students are then asked to
generate a hypothesis about which type they think birds would prefer and then test their
hypothesis. Unfortunately, there is nothing to explain here—no puzzling observation and
no causal question. Consequently, there is no need for hypotheses. At best, the lesson
calls for students to make predictions about what type the birds might prefer. Most likely,
students will have no idea why, or even whether, birds might prefer one type over another.
Nevertheless, perhaps some accommodating student will make a prediction (recall White
and Gunstone’s POE instructional framework). Having done so, the alert teacher can then
ask the student to explain why the student made the prediction. If the student can then offer
a possible reason for the prediction (e.g., I think birds will prefer the type containing lots
of little yellow seeds because those seeds are easier to crack open), the teacher can identify
such a reason as a hypothesis, which could subsequently be tested.
Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 359

Thus, more lessons are needed that begin with students making puzzling observations
that can then be collaboratively and collectively explained via hypothesis generation and
test. For example, Lawson (2002b) conducted such a lesson with college students who
were challenged to generate and test multiple hypotheses about why water rose in a glass
inverted over a burning candle that was standing in a pan of water. A key aspect of this and
similar lessons is making sure that students freely generate several hypotheses. Initially,
however, many students are reluctant to generate a hypothesis for fear of being wrong.6
This is particularly so if the teacher allows hypotheses to be “critiqued” following their
generation. For example, students often generate the hypothesis that water rises because
oxygen is being consumed by the flame and the resulting vacuum “sucks” the water up. At
hearing such a hypothesis, an alert classmate having used retroductive reasoning may be
tempted to exclaim: “That cannot be right. If it were right, then the water should stop rising
after the candle goes out. But we saw that it doesn’t!” Or another equally alert classmate
may retroductively add: “That cannot be right. If we were right, then we would have
destroyed oxygen. But we know from chemistry class that combustion does not destroy
oxygen. Instead it converts it into carbon dioxide.”
Allowing these sorts of retroductive critiques during the hypothesis-generation phase of
instruction severely restricts the number of alternatives abducted. Consequently, following
good brainstorming techniques, retroductive arguments should be put on hold until students
have generated all of the hypotheses they can think of. Only then should the teacher
challenge students use both retroductive and deductive reasoning to test the alternatives.
Students should also be told to try to test all of the generated hypotheses, not just the ones
they think might be right. In short, to produce the strongest argument, their job should
be one of not only finding evidence in favor of one hypothesis but also finding evidence
against the alternatives. Teachers should also point out that the “correct” answer may be
some combination of the generated hypotheses, or perhaps a hypothesis that has yet to be
generated.
Following much sharing of ideas, much experimentation, and much argumentation,
some of the students who participated in the candle burning lesson described above were
successful in constructing verbal and then written If/then/Therefore arguments summarizing
how they had deductively tested each hypothesis and what conclusions they were able to
draw, for example:

If . . . the water rises because carbon dioxide molecules dissolve rapidly into the water
(hypothesis), and . . . the height of water rise in two containers is compared—one with CO2
saturated water and one with normal water (planned test), then . . . the water should rise less
in the container with the CO2 saturated water than in the container with the normal water
(prediction). But . . . the water rises is the same in both containers (result). Therefore . . . the
dissolving-CO2 hypothesis is probably wrong (conclusion).

Success in conducting such a test and in constructing such an argument implies that
these students reasoned in a context in which the hypothesized causal agent (dissolving
CO2 molecules) was nonperceptible. Furthermore, to link the imagined causal agent to the
experimental manipulation (i.e., the amount of dry ice in the two containers), the students
presumably had to understand a theoretical rationale that goes something like this:

6
To encourage multiple hypothesis generation, teachers need to ask divergent, rather than convergent,
questions. For example, students are much more willing to venture a “guess” if asked “What might have
caused the water to rise?” as opposed to “What caused the water to rise?”

Science Education
360 LAWSON

TABLE 2
Retroductive Arguments Constructed by Students While Attempting to
Test Hypotheses (from Lawson, 2002b)
Testing the dissolving-CO2 hypothesis
If . . . the oxygen is converted to carbon dioxide, and . . . the carbon dioxide dissolves
in the water, then . . . the inside pressure should be less than outside causing water
to rise. And . . . the water did rise. Therefore . . . the hypothesis is correct due to
rising of the water.
Testing the expanding-water hypothesis
If . . . water absorbs heat from the flame, and . . . that causes water to expand, then
. . . we should see the water rise. And . . . it does. Therefore . . . the hypothesis is not
disproved.
Testing the consumed-oxygen hypothesis
If . . . oxygen is consumed creating a partial vacuum, and . . . it causes a vacuum into
which the water is sucked, then . . . the water level should rise, which it does.
Therefore . . . the hypothesis is supported.
Testing the phlogiston hypothesis
If . . . the candle is lit before covering it with the jar, then . . . the water should rise
when the flame (phlogiston) goes out, and . . . the water did rise. Therefore . . . the
hypothesis is supported.

Dissolving CO2 molecules presumably cause a reduction of air pressure in the cylinder. This
reduction in turn causes the water rise. Consequently, when the water is already saturated
with CO2 molecules, the newly created CO2 molecules cannot escape into the water, hence
the internal pressure will not be reduced and the water will not rise.

The theoretical rationale in this case is used to link the imagined causal agent (i.e.,
dissolved CO2 molecules) to the manipulated (i.e., independent) variable in the experiment
(i.e., the amount of dry ice added to the two containers). Yet, several other students could
do no better than generate retroductive arguments such as those listed in Table 2. The argu-
ments in the table certainly suggest that these students failed to understand the limitations
of retroductive reasoning and failed to appreciate the need for deductive tests with clearly
stated predictions. Nevertheless, retroductive reasoning can be very important—recall the
retroductive nature of thought experiments. Also consider Albert Einstein’s general relativ-
ity theory, which in 1907 retroductively explained the puzzling 43 arcseconds per century
shift in Mercury’s orbit. Importantly, the theory also deductively predicted that starlight
passing the sun would be displaced outward by 1.7 arcseconds—a prediction that was sub-
sequently confirmed by astronomical observations made in 1919. When a graduate student
later asked Einstein what he would have done had the observations (made by Sir Arthur
Eddington) had shown his theory wrong, he replied: “Then I would have been sorry for the
dear Lord (referring to Eddington); the theory is correct” (Isaacson, 2007, p. 259).
Presumably, Einstein was speaking somewhat in jest. In fact, when subsequent obser-
vations made by Edwin Hubble in the 1920s contradicted another prediction of general
relativity theory (i.e., the universe is not expanding), Einstein was quick to modify the
theory to take Hubble’s result into account. Although in this instance, modification was
relatively easy because Einstein’s original version of the theory had in fact predicted an
expanding universe. But at the time the theory was generated, evidence implied a non-
expanding universe. Accordingly, Einstein added a “cosmological constant” to his field
equations to keep the theory consistent with a static universe. Later, Einstein would call
this addition “the biggest blunder he ever made in his life” (Isaacson, 2007, pp. 355–356).
Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 361

It was a huge blunder because had the constant not been added, the theory would have
been left predicting an expanding universe—a prediction that would have been confirmed
several years later making Einstein all the more famous.
Interestingly, when the students who constructed the retroductive arguments listed in
Table 2 were asked to generate and test hypotheses about the possible cause(s) of vari-
ation in the speed of a pendulum’s swing (i.e., What causes some pendulums to swing
faster than others?), they had no problem in doing so and in later constructing deductive
If/then/Therefore arguments like this one:

If . . . the amount of weight causes changes in swing rates, and . . . the weights are varied
while holding other possible causes constant, then . . . rate of pendulum swing should vary.
But . . . when we conducted the experiment, we found that the rates did not vary. Therefore
. . . the weight hypothesis is contradicted.

Although this pattern of argumentation is the same as that used to deductively test
the dissolving CO2 hypothesis, here a theoretical rationale is not needed because the test
involves an experiment in which the possible cause is directly manipulated. In other words,
the proposed cause is the amount of weight and the experiment’s independent variable
also is the amount of weight. Importantly, this variable can be easily manipulated because
weight differences can be sensed. Thus, causal hypothesis testing appears to occur on
two qualitatively different levels, with success at testing hypotheses involving perceptible
causal agents as a likely prerequisite for becoming proficient at testing hypotheses involving
nonperceptible theoretical entities. Thus, students may first become generally skilled at
testing hypotheses about perceptible causal agents. And, perhaps, only then, given the
necessary developmental conditions, do they become generally skilled at testing hypotheses
about nonperceptible causal agents (cf., Lawson et al., 2000).
Consequently, teachers should provoke students to construct, reflect on, and then try to
produce written arguments of what they have done in pendulum-like contexts before asking
them to do so in contexts in which the hypothesized causal agents are nonperceptible. For
example, consider the question that Sampson and Clark (2008) posed to students, that is,
Why do some objects, such as a metal and a wooden spoon, feel like they are at different
temperatures even though they have been sitting in the same room for several hours? Here,
at least two levels of responses are possible. The first level can be provoked by first asking
students to feel several objects and report which ones feel colder, warmer, and so on.
Upon doing so, students will report that some objects (i.e., metal ones) feel colder than
other objects (e.g., wooden ones). These observations raise a causal question: Why do
metal objects feel colder than wooden objects? Students can then generate some alternative
hypotheses: for example, metal objects feel colder because they are colder. They can test
this hypothesis by measuring the temperatures of the objects in question: If . . . metal
objects feel colder than wooden objects because they are colder, and . . . we measure the
temperatures of the metal and the wooden objects, then . . . the measured temperatures
of the metal objects should be lower. Of course the students’ results will contradict the
hypothesis: that is, but . . . the temperatures of the metal and wooden objects are the same.
Therefore . . . the hypothesis is contradicted. So the students will now have encountered a
real puzzling observation, namely, some objects feel colder than others in spite of the fact
that they are at the same temperature!
This puzzling observation raises a second, higher level, causal question, to which students
can again be asked to generate hypotheses. However, at least for the middle school students
interviewed by Sampson and Clark, the sorts of hypotheses needed here and their means
of testing are probably beyond their reach. Nevertheless, here is one hypothesis and a way
Science Education
362 LAWSON

to test it: If . . . metal objects feel colder than wooden objects at any given temperature
because the metals’ atoms are packed closer together—hence conduct heat better—hence
feel colder, and . . . we measure and compare the densities of the objects, then . . . the metal
objects should have greater densities than the wooden objects. Of course, upon measuring
and comparing densities, the students will find that, as predicted, the metals are denser.
Therefore . . . they can conclude that the hypothesis has been supported.
Unfortunately, developing many such hypothesis-driven, inquiry-based lessons and prop-
erly matching the lessons’ intellectual demands with the students’ initial reasoning skills
and their declarative knowledge remains an unmet educational challenge.7 A related un-
met challenge is educating teachers so that they (1) understand the underlying patterns of
reasoning and argumentation and (2) understand how best to teach such lessons so that
students become better able to abductively generate and then test alternative hypotheses
using retroduction, deduction, and induction.

The author thanks John Alcock for several helpful comments during the preparation of the manuscript.
Any opinions, findings, and conclusions or recommendations expressed in this publication are those
of the author and do not necessarily reflect the views of the National Science Foundation.

REFERENCES
Allchin, D. (2006). Lawson’s shoehorn—Reprise. Science & Education, 15, 113 – 120.
Alters, B. J. (1997). Whose nature of science? Journal of Research in Science Teaching, 34(1), 39 – 55.
Alvarez, W. (1997). T. rex and the crater of doom. Princeton, NJ: Princeton University Press.
American Association for the Advancement of Science. (1989). Project 2061: Science for all Americans.
Washington, DC: Author.
American Association for the Advancement of Science. (2007). Atlas of scientific literacy (Vol. 2). Washington,
DC: Author.
Bergman, M., & Paavola, S. (Eds.). (2003a). The Commens dictionary of Peirce’s terms. Retrieved from
http://www.helsinki.fi/science/commens/dictionary.html. (Reprinted from A letter to Calderoni, by C. S. Peirce,
1905)
Berland, K. K., & Reiser, B. J. (2009). Making sense of argumentation. Science Education, 93(1), 26 – 55.
Biela, A. (1993). Psychology of analogical inference. Stuttgart, Germany: S. Hirzel Verlag.
Bonner, J. J. (2005). Which scientific method should we teach & when? The American Biology Teacher, 67(5),
262 – 264.
Brannigan, A. (1981). The social basis of scientific discoveries. Cambridge, England: Cambridge University Press.
Collins, H. M. (1985). Changing order. London: Sage.
Cothron, J. H., Giese, R. N., & Rezba, R. J. (2006). Students and research. Dubuque, IA: Kendall/Hunt.
Crick, F. H. C., Barnett, F. R. S. L., Brenner, S., & Watts-Tobin, R. J. (1962). General nature of the genetic code
for proteins. Nature, 192, 1227 – 1232.
Crouch, T. D. (1992). Why Wilber and Orville? Some thoughts on the Wright brothers and the process of invention.
In R. J. Weber & D. N. Perkins (Eds.), Inventive minds (pp. 80 – 96). New York: Oxford University Press.
Darwin, C. (1898). The origin of species (7th ed.). New York: Appleton & Company.
Diamond, J. (1997). Guns, germs, and steel. New York: Norton.
Educational Policies Commission. (1961). The central purpose of American education. Washington, DC: National
Education Association of the United States.

7
In terms of learning cycle instruction, lessons in which students generate and deductively test alternative
hypotheses have been called hypothetico-deductive or hypothetical-predictive learning cycles (e.g., Lawson,
1995, 2009b; Lawson, Abraham, & Renner, 1989). Learning cycles in which students explore nature
and simply identify patterns and/or make puzzling observations without generating possible explanations
have been called descriptive learning cycles. And learning cycles in which students confront puzzling
observations and generate possible explanations, but test them only with previously gathered data, have
been called empirical-abductive (i.e., retroductive) learning cycles Thus, the three types of learning cycles
represent segments along a continuum from descriptive to experimental science. As such, they place
differing demands on student initiative, knowledge, and reasoning skill.

Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 363

Educational Policies Commission. (1966). Education and the spirit of science. Washington, DC: National Educa-
tion Association of the United States.
Erduran, S., Simon, S., & Osborne, J. (2004). TAPping into argumentation: Developments in the application of
Toulmin’s argument pattern for studying science discourse. Science Education, 88(6), 915 – 933.
Evans, J. S. (2008). Dual-processing accounts of reasoning, judgment and social cognition. Annual Review of
Psychology, 59, 255 – 278.
Finke, R. A., Ward, T. B., & Smith, S. M. (1992). Creative cognition: Theory research and practice. Cambridge,
MA: The MIT Press.
Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitive-developmental inquiry.
American Psychologist, 34, 306 – 326.
Gentner, D. (1989). The mechanisms of analogical learning. In S. Vosniadou & A. Ortony (Eds.), Similarity and
analogical reasoning. Cambridge, England: Cambridge University Press.
Giere, R. N., Bickle, J., & Mauldin, R. F. (2006). Understanding scientific reasoning (5th ed.). Belmont, CA:
Thomson Higher Education.
Grant, P. R. (1986). Ecology and evolution of Darwin’s finches. Princeton, NJ: Princeton University Press.
Grant, B. R., & Grant, P. R. (1989). Evolutionary dynamics of a natural population: The large cactus finch of the
Galapagos. Chicago: University of Chicago Press.
Hanson, N. R. (1958). Patterns of discovery. London: Cambridge University Press.
Hempel, C. (1966). Philosophy of natural science. Upper Saddle River, NJ: Prentice-Hall.
Holyoak, K. J. (2005). Analogy. In K. J. Holyoak & R. G. Morrison (Eds.), The Cambridge handbook of thinking
and reasoning (pp. 117 – 142). New York: Cambridge University Press.
Hsu. T. (2005). Foundations of physical science investigations (2nd ed.). Peabody, MA: CPO Science.
Isaacson, W. (2007). Einstein: His life and universe. New York: Simon & Schuster.
Koestler, A. (1964). The act of creation. London: Hutchinson.
Lawson, A. E. (1995). Science teaching and the development of thinking, Belmont, CA: Wadsworth.
Lawson, A. E. (2002a). What does Galileo’s discovery of Jupiter’s moons tell us about the process of scientific
discovery? Science & Education, 11(1), 1 – 24.
Lawson, A. E. (2002b). Sound and faulty arguments generated by pre-service biology teachers when testing
hypotheses involving un-observable entities. Journal of Research in Science Teaching, 39(3), 237 – 252.
Lawson, A. E. (2003). The nature and development of hypothetico-predictive argumentation with implications
for science teaching. International Journal of Science Education, 25(11), 1387 – 1408.
Lawson, A. E. (2004). T. rex, the crater of doom, and the nature of scientific discovery. Science & Education, 13,
155 – 177.
Lawson, A. E. (2005). What is the role of induction and deduction in reasoning and scientific inquiry? Journal of
Research in Science Teaching, 42(6), 716 – 740.
Lawson, A. E. (2006a). Allchin’s errors and misrepresentations and the H-D nature of science. Science Education,
90(2), 289 – 292.
Lawson, A. E. (2006b). Developing scientific reasoning patterns in college biology. In J. J. Mintzes & W. H.
Leonard (Eds.), Handbook of college science teaching: Theory, research, and practice (pp. 109 – 118).
Washington, DC: National Science Teachers Association.
Lawson, A. E. (2009a). On the hypothetico-deductive nature of science—Darwin’s finches. Science & Education,
18(1), 119 – 124.
Lawson, A. E. (2009b). Teaching inquiry science in middle and secondary schools. Thousand Oaks, CA:
Sage.
Lawson, A. E., Abraham, M. R., & Renner, J. W. (1989). A theory of instruction: Using the learning cycle to teach
science concepts and thinking skills. Cincinnati, OH: National Association for Research in Science Teaching.
Lawson, A. E., Clark, B., Cramer-Meldrum, E., Falconer, K. A., Kwon, Y. J., & Sequist, J. M. (2000). The
development of reasoning skills in college biology: Do two levels of general hypothesis-testing skills exist?
Journal of Research in Science Teaching, 37(1), 81 – 101.
Lawson, D. I., & Lawson, A. E. (1993). Neural principles of memory and a neural theory of analogical insight.
Journal of Research in Science Teaching, 30(10), 1327 – 1348.
Mahootian, F., & Eastman, T. E. (in press). Complimentary frameworks of scientific inquiry: Hypothetico-
deductive, hypothetico-inductive, and observational inductive. World Futures. The Journal of General Evolution.
McNeill, K. L., & Krajcik, J. (2007). Middle school students’ use of appropriate and inappropriate evidence in
writing scientific explanations. In M. C. Lovett & P. Shah (Eds.), Thinking with data: The proceedings of the
33rd Carnegie Symposium on Cognition (pp. 233 – 265). Mahwah, NJ: Erlbaum.
Misak, C. (2004). Charles Sanders Peirce (1839 – 1914). In C. Misak (Ed.), The Cambridge companion to Peirce.
Cambridge, England: Cambridge University Press.
National Research Council. (1990). Fulfilling the promise: Biology education in the nation’s schools. Washington
DC: National Academies Press.

Science Education
364 LAWSON

National Research Council. (1996). National Science Education Standards. Washington, DC: National Academies
Press.
National Research Council. (2001). Educating teachers of science, mathematics, and technology. Washington,
DC: National Academies Press.
Newton, P., Driver, R., & Osborne, J. (1999). The place of argumentation in the pedagogy of school science.
International Journal of Science Education, 21, 553 – 576.
Nirenberg, M. W., & Matthaei, J. H. (1961). The dependence of cell-free protein synthesis in E. coli upon naturally
occurring or synthetic polyribonucleotides. Proceedings of the National Academy of Sciences of the United
States of America, 47(10), 1580 – 1588.
Oehrtman, M., & Lawson, A. E. (2008). Connecting science and mathematics: The nature of proof and disproof
in science and mathematics. International Journal of Science and Mathematics Education, 6(2), 377 – 403.
Planck, M. (1949). Scientific autobiography (E. Guynor, Trans.). New York: Philosophical Library.
Platt, J. R. (1964). Strong inference. Science, 146, 347 – 353.
Polya, G. (1954). Patterns of plausible inference. Princeton, NJ: Princeton University Press.
Popper, K. (1965). Conjectures and refutations: The growth of scientific knowledge. New York: Basic Books.
Samarapungavan, A., Westby, E. L., & Bodner, G. M. (2006). Contextual epistemic development in science: A
comparison of chemistry students and research chemists. Science Education, 90(3), 468 – 495.
Sampson, V., & Clark, D. B. (2008). Assessment of the ways students generate arguments in science education:
Current perspectives and recommendations for future directions. Science Education, 92(3), 447 – 472.
Schick, T. S., Jr., & Vaughn, L. (1995). How to think about weird things. Mountain View, CA: Mayfield.
Shapley, H., Rapport, S., & Wright, H. (Eds.). (1954). A treasury of science. New York: Harper & Brothers.
(Reprinted from The sidereal messenger, by G. Galilei, 1610)
Sternberg, R. J., & Davidson, J. E. (Eds.) (1995). The nature of insight. Cambridge, MA: The MIT Press.
Tidman, P., & Kahane, H. (2003). Logic and philosophy (9th ed.). Belmont, CA: Wadsworth/Thomson.
Toulmin, S. (1969). The uses of argument. Cambridge, England: Cambridge University Press.
Turrisi, P. A. (Ed.). (1997). Pragmatism as a principle and method of right thinking. The 1903 Harvard lectures
on pragmatism. Albany: State University of New York Press. (Reprinted from C. S. Peirce, 1903; see also The
Commens dictionary of Peirce’s terms, by M. Bergman & S. Paavola, Eds., 2003a, 2003b. Retrieved May 18,
2009, from http://www.helsinki.fi/science/commens/dictionary.html)
Westerland, J., & Fairbanks, D. (2004). Gregor Mendel and “myth-conceptions.” Science Education, 88, 754 – 758.
White, R. & Gunstone, R. (1992). Probing understanding. London: Falmer Press.
Wivagg, D., & Allchin, D. (2002). The dogma of “the” scientific method. The American Biology Teacher, 64(9),
645 – 646.
Woodward, J., & Goodstein, D. (1996). Conduct, misconduct and the structure of science. American Scientist,
84, 479 – 490.

Science Education

You might also like