Professional Documents
Culture Documents
1-Basic Inferences SciEd
1-Basic Inferences SciEd
ANTON E. LAWSON
Organismal, Integrative and Systems Biology, School of Life Sciences, Arizona State
University, Tempe, AZ 85287, USA
DOI 10.1002/sce.20357
Published online 28 May 2009 in Wiley InterScience (www.interscience.wiley.com).
ABSTRACT: Helping students better understand how scientists reason and argue to draw
scientific conclusions has long been viewed as a critical component of scientific literacy,
thus remains a central goal of science instruction. However, differences of opinion persist
regarding the nature of scientific reasoning, argumentation, and discovery. Accordingly,
the primary goal of this paper is to employ the inferences of abduction, retroduction,
deduction, and induction to introduce a pattern of scientific reasoning, argumentation, and
discovery that is postulated to be universal, thus can serve as an instructional framework to
improve student reasoning and argumentative skills. The paper first analyzes three varied
and presumably representative case histories in terms of the four inferences (i.e., Galileo’s
discovery of Jupiter’s moons, Rosemary and Peter Grants’ research on Darwin’s finches,
and Marshall Nirenberg’s Nobel Prize–winning research on genetic coding). Each case
history reveals a pattern of reasoning and argumentation used during explanation testing
that can be summarized in an If/then/Therefore form. The paper then summarizes additional
cases also exemplary of the form. Implications of the resulting theory are discussed in terms
of improving the quality of research and classroom instruction. C 2009 Wiley Periodicals,
C 2009 Wiley Periodicals, Inc.
BASIC INFERENCES OF SCIENTIFIC REASONING 337
INTRODUCTION
Scientific literacy as an instructional goal typically includes students’ understanding of
the nature of science and scientific reasoning (e.g., American Association for the Advance-
ment of Science, 1989, 2007; Educational Policies Commission, 1961, 1966; National Re-
search Council, 1990, 1996, 2001). Not surprisingly, numerous papers have been published
in recent years exploring the nature of scientific reasoning, scientific argumentation, and
scientific discovery. Yet differences and disagreements persist (e.g., Allchin, 2006; Alters,
1997; Bonner, 2005; Lawson, 2006a; Samarapungavan, Westby, & Bodner, 2006; Sampson
& Clark, 2008; Westerland & Fairbanks, 2004; Wivagg & Allchin, 2002). Accordingly,
the primary goal of this paper is to analyze several varied case histories to identify basic
inferences and a pattern of scientific discovery that is postulated to be general enough to
serve as an instructional framework to improve student reasoning and argumentative skills.
Identifying basic inferences and such a pattern within varied contexts should help teachers
and curriculum developers design and teach lessons that will help students construct better
understanding of how science works, thus help them become scientifically literate. The
examples may also help researchers improve the quality of their own research.
We should note at the outset that the present view departs somewhat from the view of ar-
gumentation advanced by philosopher Stephen Toulmin (1969) and emphasized by science
educators such as Newton, Driver, and Osborne (1999) and Erduran, Simon, and Osborne
(2004) in that it sees the primary role of argumentation, not as one of convincing others of
one’s point of view (although that is certainly part of the story) but rather as one of discov-
ering which of several possible explanations for a particular puzzling observation should
be accepted and which should be rejected. Thus, instead of Toulmin’s claims, warrants,
and backings, at the heart of the present theory lies multiple possible explanations, pre-
dictions, and evidence designed to either support or contradict each proposed explanation.
Indeed, the present view suggests that the best argument considers all of the alternatives
and explicitly includes the relevant evidence and reasoning supporting and/or contradicting
each.
We begin by analyzing the reasoning presumably involved in Galileo Galilei’s (1564–
1642) discovery of Jupiter’s moons in 1610 (Galilei, 1610, as translated and reprinted in
Shapley, Rapport, & Wright, 1954, and as initially interpreted by Lawson, 2002a). Galileo’s
discovery is analyzed in terms of the inference of abduction, as defined in the present paper,
and the inferences of retroduction, deduction, and induction as defined by Charles Sanders
Peirce (1839–1914). Peirce was an American philosopher, logician, and mathematician. In
the words of Misak (2004), “His work is staggering in its breadth. . . . But because of the
scattered nature of his work and because he was always out of the academic mainstream,
many of his contributions are just now coming to light” (p. 1).
Peirce is generally credited as the originator of pragmatism, a philosophical view op-
posing logical positivism and favored by contemporaries William James and John Dewey.
Pragmatism, with its roots in Darwinian evolutionary theory, argues against the existence
of absolute or transcendental truth and in favor of a more ecological account of knowledge
generation grounded in inquiry and in the testing and retention of ideas that work. Peirce’s
position most relevant to the task at hand is his view of how theory, observation, and
reasoning interact to test claims that have been advanced to explain puzzling observations.
Following the analysis of Galileo’s discovery, we turn to a similar analysis of Rose-
mary and Peter Grants’ monumental research on Darwin’s finches, which is then followed
by consideration of the Nobel Prize–winning research of Marshall Nirenberg. These case
histories, in turn, are followed by additional examples that explore the extent to which
the resulting pattern of reasoning and argumentation can be generalized to other scientific
Science Education
338 LAWSON
and nonscientific fields. The cases have been selected because they represent human rea-
soning and discovery across a broad and presumably representative range of disciplines
(i.e., astronomy, evolutionary biology, biochemistry, human history, geology, physics, and
engineering).
Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 339
Before Peirce treated retroduction as an inference, logicians had recognized that the rea-
sonable proposal of an explanatory hypothesis was subject to certain conditions. The
hypothesis cannot be admitted, even as a tentative conjecture, unless it would account for
the phenomena posing the difficulty—or at least some of them. (p. 86)
If . . . the points of light near Jupiter are fixed stars, and . . . I observe the heavens near
Jupiter, then . . . I should see points of light that look like fixed stars. And . . . I do see points
of light that look somewhat like fixed stars. Therefore . . . they may in fact be fixed stars.
. . . although I believed them to belong to the number of the fixed stars, yet they made me
somewhat wonder, because they seemed to be arranged exactly in a straight line, parallel to
the ecliptic, and to be brighter than the rest of the stars, equal to them in magnitude. (p. 59)
When Galileo’s expressed doubt is cast in the If/then/Therefore form of the previous
retroductive argument, we get the following:
If . . . the three points of light near Jupiter are fixed stars, and . . . their sizes, brightness and
positions are compared to each other and to nearby fixed stars, then . . . variations in size,
brightness and position should be random, as is the case for other fixed stars. But . . . “they
seem to be arranged exactly in a straight line, parallel to the ecliptic, and to be brighter than
the rest of the stars.” Therefore . . . the fixed-stars hypothesis is contradicted. Or as Galileo
put it, “yet they made me wonder somewhat.”
Science Education
340 LAWSON
fact explain the puzzling observation, he then sought a way to construct a more convincing
test. That more convincing test required generating one or more predictions about future
observations that should occur provided that his new hypothesis is correct. Peirce referred
to this inference as deduction, which he described as follows:
Thus, using deduction,2 Galileo generated an argument that led from his new hypothesis
to two future predictions, that is,
If . . . the three points of light are moons orbiting Jupiter (orbiting-moons hypothesis), and
. . . I observe them over the next several nights (planned test), then . . . some nights they
should appear to the east of Jupiter and some nights they should appear to the west. Further,
they should always appear along a straight line on either side of Jupiter (predictions).
I, therefore, concluded, and decided unhesitatingly, that there are three stars in the heavens
moving about Jupiter, as Venus and Mercury round the sun. . . . These observations also
established that there are not only three, but four, erratic sidereal bodies performing their
revolutions round Jupiter. . . . These are my observations upon the four Medicean planets,
recently discovered for the first time by me. (pp. 60 – 61)
And . . . some nights they appeared to the east of Jupiter and some nights they appeared
to the west. Further, they always appeared along a straight line on either side of Jupiter
(observed results). Therefore . . . the orbiting- moons hypothesis is supported (conclusion).
2
Like abduction, deduction depends on connections with declarative knowledge. Declarative knowledge
is needed for the thinker to know what is implied by any particular hypothesis in question. Thus, one should
not view deduction as taking place in a strictly “logical” or “necessary” fashion or as a process resulting
in certainty. In this sense, both retroduction and deduction are dependent upon specifics related to the
situation at hand (i.e., what one might call disciplinary or declarative knowledge). For further explication,
see Lawson (2006b). Another key point here is that the thinker is generating predictions about outcomes
that the thinker has yet to observe. But this does not mean that the observations may not have already been
made by someone.
Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 341
According to Peirce, this final inference, which he called induction, was used to draw
this conclusion (Turrisi, 1903/1997). More generally,
If . . . the predicted and observed results match, like they do in this case, then . . . the
hypothesis is supported. On the other hand, if . . . the predicted and observed results do not
match, then . . . the hypothesis would have been contradicted.
Although Peirce referred to this inference as induction, it is not the form of induction
that some have claimed generates general conclusions from limited cases (e.g., this crow is
black, so is this one, and so on—therefore all crows are black)—a form of “enumerative”
induction that people probably do not use (e.g., Lawson, 2005; Popper, 1965). Rather,
enumerative induction can at best suggest descriptive claims in need of deductive test (e.g.,
all of the crows I have seen are black; thus, perhaps, all crows are black. If . . . all crows are
black, and . . . this new bird is a crow, then . . . I deduce/predict that it will also be black).
The form of induction that Galileo presumably used can be characterized as an inference
that leads to increased confidence in one’s conclusions with each additional supporting or
contradicting result. In Peirce’s words,
Note, however, that consistent with Peirce’s underlying pragmatism, this view of knowl-
edge generation falls short of certainty. One cannot be certain that an explanation is correct
because explanation generation is the product of human imagination and any number of al-
ternatives may lead to the same prediction. Hence, the subsequent observation of a specific
predicted result can, in theory at least, be taken as evidence for more than one explanation.
Likewise, one cannot be certain that a contradicted explanation is in fact wrong because a
mismatch between a prediction and an observation may not be due to a faulty explanation.
It may be due to a faulty test and/or to a faulty deduction. Also as pointed out by authors
such as Brannigan (1981) and Collins (1985), scientific claims are generated and evaluated
within social and cultural contexts that play a role in their acceptance or rejection. Recall
the words of Charles Darwin written in the concluding chapter of The Origin of Species
(initially published in 1859):
Although I am fully convinced of the truth of the views given in this volume under the form
of an abstract, I, by no means expect to convince experienced naturalists whose minds are
stocked with a multitude of facts all viewed during a long course of years, from a point
of view directly opposite to mine. . . . but I look with confidence to the future, -to young
and rising naturalists, who will be able to view both sides of the question with impartiality.
(1898 edition, pp. 294 – 295)
Similarly, in his autobiography, the physicist Max Planck (1949) wrote: “A scientific
truth does not triumph by convincing its opponents and making them see the light, but
rather because its opponents eventually die, and a new generation grows up that is familiar
with it.”
Science Education
342 LAWSON
Figure 1. A model of the elements of If/then/Therefore reasoning and argumentation used during the generation
and subsequent test of proposed explanations. Arguments are retroductive when results have been obtained by the
thinker before hypothesis and prediction generation and deductive when results are obtained after.
1. First, thanks to his new and improved telescope (the role of technology), Galileo
undertook a new exploration that led to a puzzling observation (the three unexplained
points of light near Jupiter).
2. Then, thanks to his prior store of declarative knowledge, Galileo used abduction to
generate a hypothesis (a tentative explanation) for the points of light (i.e., perhaps
they are fixed stars).
3. Next, Galileo used retroduction to subconsciously test his fixed-stars hypothesis,
which led to some doubt and then to rejection.
4. Then he once again used abduction to generate another hypothesis (the orbiting-
moons hypothesis), which when presumably checked by retroduction was supported.
5. He then used deduction to generate future predictions, which also require connections
in declarative knowledge.
6. Subsequently, after the cloud cover dissipated he made the necessary observations,
which matched his predictions.
7. Finally, on the basis of this match, he used induction to draw the conclusion that his
orbiting-moons hypothesis had been supported. Therefore, he was able to proudly
proclaim to the world that he was the first to discover “four, erratic sidereal bodies
performing their revolutions round Jupiter.”
Viewed in this way, scientific reasoning and discovery consist of undertaking novel
explorations that lead to puzzling observations that are subsequently explained by the
Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 343
TABLE 1
Basic Inferences of Scientific Reasoning, Argumentation, and Discovery
Inference Question Example
Abduction What caused the If . . . points of light seen in the night sky are
puzzling observation caused by fixed stars embedded in the
(e.g., the three new celestial sphere, and . . . three new similar
points of light near looking points of light are seen in the night
Jupiter)? sky, then . . . perhaps they also are fixed
stars.
Retroduction Does the proposed If . . . the points of light are fixed stars, and
cause explain what we . . . their positions are compared to each
already know? other, then . . . their positions should be
random. But . . . they appear exactly in a
straight line parallel to the ecliptic.
Therefore . . . perhaps they are not fixed
stars.
Deduction What does the proposed If . . . the three points of light are moons
cause lead us to orbiting Jupiter, and . . . I observe them
predict about future over the next several nights, then . . . some
observations? nights they should appear to the east of
Jupiter and some nights they should
appear to the west. Further, they should
appear along a straight line on either side
of Jupiter.
Induction How do the predictions If . . . the new observations match the
and new observations predictions based on the orbiting-moons
compare? hypothesis, as they do in this case (e.g.,
some nights the lights appeared to the
east of Jupiter and some nights they
appeared to the west), then . . . the
hypothesis is supported.
cyclic and repeated use of abduction, retroduction, deduction, observation, and induction.
Again in Peirce’s words,
Abduction [retroduction] furnishes all our ideas concerning real things, beyond what are
given in perception, but is mere conjecture, without probative force. Deduction is certain but
relates only to ideal objects. Induction gives us the only approach to certainty concerning
the real that we can have. In forty years diligent study of arguments, I have never found
one which did not consist of these elements. (Bergman & Paavola, 1905/2003a, CP 8.209)
We next consider Rosemary and Peter Grants’ monumental research on Darwin’s finches
of the Galapagos Islands to see if we can identify the same inferences and pattern of
reasoning and argumentation in biological discovery.
Darwin’s finches—massive data collection done without any explicit hypothesis (as one
notable case) has nonetheless led to significant and widely respected claims” (p. 118).
Is Allchin’s characterization of the Grants’ work correct? If so, then their research
certainly would not fit the previous pattern. However, in characterizing the Grants’ research,
Allchin failed to cite any of their original accounts. Indeed, when one does consult what
the Grants say they did and when they did it, a very different picture emerges (e.g., Grant,
1986; Grant & Grant, 1989; P. R. Grant, personal communication, April 4, 2006). Thus,
let us take a close look at just what the Grants had to say about their research (also see
Lawson, 2009a). First, consider Peter Grant’s comments in the Preface of his 1986 book
Ecology and Evolution of Darwin’s Finches:
I chose to study the finches for two quite different reasons. The first arose from a confusion
about the significance of population variation. . . . The second reason sprang from a similar
confusion concerning inter-specific competition. . . . Since, the classical case of character
displacement was invalid (Grant 1972b, 1975a), it was logical to turn attention to the
classical case of character release. The classical case involves two species of Darwin’s
Finches on the Galapagos Islands (Brown & Wilson 1956). Despite having two good
reasons for studying the finches, I might never have begun research on them without the
stimulus of a proposal from a prospective postdoctoral Fellow, Ian Abbott. He had developed
a plan for detecting the effects of inter-specific competition among Darwin’s Finches. . . .
We prepared a research proposal and sought financial support. (pp. xi – xii)
Also consider these two quotes from the Preface and opening chapter of Rosemary and
Peter Grants’ follow-up book Evolutionary Dynamics of a Natural Population: The Large
Cactus Finch of the Galapagos (Grant & Grant, 1989):
Genetic variation in quantitative characteristics is the raw material for much of evolution.
A substantial body of theoretical work deals with the maintenance and significance of such
genetic variation. Field studies of the subject have been largely neglected, yet such studies
that employ a theoretical framework can be immensely valuable. (p. xvii)
The theoretical framework sets the scope of the study and helps us to identify major factors
in need of measurement. (p. 11)
These quotes should make it clear that the Grants’ data collection was directed and
preceded by a theoretical framework—specifically evolutionary theory and the classical
case of character release. Also consider this passage from Peter Grant (1986), a passage
that provides the general sequence of their reasoning and research:
Testing the competition hypothesis is difficult, for two reasons. First, the hypothesis deals
with the past. Since we cannot reconstruct those events precisely, we cannot test the
hypothesis directly . . . instead it must be tested through its consequences (predictions). . . .
To put the arguments into a testable framework we must rephrase them, along the following
lines. The observations to be explained are the distributions of species and the inter-island
differences in beak size and shape; the hypothesis is that distribution and morphology
were causally influenced by inter-specific competition for food; the main assumption upon
which the hypothesis rests is that the feeding niche of a population is reflected in, and hence
adequately indexed by, the average beak characteristics. . . . I shall now give two examples
of an examination of the hypothesis through a test of its predictions. . . . We should expect
that G. conirostris on Espanola, with mean beak characteristics intermediate between those
of the absent G. magnirostris, G. fortis, and G. scandens, has an intermediate feeding niche
position too. Not only that, it is expected to combine the niches of the three missing species,
and consequently its niche should be particularly broad. These are falsifiable predictions
Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 345
because they are not necessarily true3 . . . data to test these predictions were collected in the
early and middle dry seasons . . . in 1973 – 1979. . . . The predictions were supported by the
results. (p. 301)
Accordingly, we can summarize the Grants’ research in terms of the four inferences
previously used to characterize Galileo’s research. A time sequence is also included to
document the order of events and to make clear that, contrary to Allchin’s claim, the
generation of explicit hypotheses and the deductive derivation of predictions preceded their
“massive” data collection.
Abduction
The evolutionary-based hypothesis put forth to account for the distributions and mor-
phological differences was that they were caused by inter-specific competition for food.
Peter Grant discussed this hypothesis with respect to G. conirostris in a paper published
in 1972 (see Grant, 1986, p. xii). Thus, by 1972, the hypothesis must have been part of
Peter Grant’s declarative knowledge, thus must have been previously generated, perhaps in
response to his reading of the Brown and Wilson paper that discussed the classical case of
character release involving two similar species of Darwin’s finches. Thus, the hypothesis
was then abductively generated (e.g., If . . . the characteristics of the Darwin’s finches stud-
ied by Brown and Wilson were caused by character release, then . . . perhaps the interisland
morphological differences of G. conirostris were similarly caused by character release).
Retroduction
This character-release (i.e., inter-specific competition) hypothesis could then be retro-
ductively tested with an argument that would look something like this:
3
Peter Grant’s use of the term falsifiable should not be seen as adoption of the view (sometimes and
probably mistakenly attributed to Karl Popper) that science progresses only via the falsification (disproof)
of explanatory hypotheses. Upon reflection, it should be clear that a scientist with a novel explanation for
some puzzling observation does not want to falsify his or her explanation (e.g., Woodward & Goodstein,
1996). However, as Grant states, the approach does oblige the scientist to derive and conduct tests that
could in principle contradict the hypothesis in question. One should say contradict, but not falsify, because,
as mentioned, the source of a mismatch between predicted results and observed results might not be due
to a faulty hypothesis. Instead, it might be due to a faulty test and/or a faulty deduction. Nevertheless,
scientists must be willing and able to test their proposed explanations by planning tests that deductively
yield predicted results that may in fact not occur, thus potentially contradict their explanations. For example,
in Galileo’s case, had he not observed, on subsequent nights, the points of light to the east and then to the
west of Jupiter, as predicted, his observations would have contradicted (i.e., “falsified”) his orbiting-moons
hypothesis.
Science Education
346 LAWSON
Deduction
As mentioned, one should not simply retroductively test hypotheses. One should also
deductively derive predictions that can then direct the collection of future relevant data.
Accordingly, to further and convincingly test the inter-specific competition hypothesis, the
following deductive argument was generated:
Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 347
planning of future deductive tests, which are followed by the gathering and analysis of
data. When using Bonner’s Method B, however, hypotheses are generated only after the
collection of data. The hypotheses then serve to explain the already gathered data.
In support of the existence and usefulness of Method B, Bonner cites the Nobel Prize–
winning research of Marshall Nirenberg conducted during the early 1960s. According
to Bonner, Nirenberg’s research followed Method B and asked this descriptive question:
“What amino acid does UUU code for?” At the time, biologists thought that the DNA code
consisted of four letters (adenine, A; guanine, G; cytosine, C; and thymine, T). They also
suspected that the DNA code was first translated into an RNA code, also with four letters,
but with uracil (U) substituting for thymine (T). Hence, an RNA code consisting of combi-
nations of As, Gs, Cs, and Us somehow coded for the production of proteins by somehow
stringing the 20 some amino acids together. So according to Bonner’s interpretation of
Nirenberg’s research, there could have been any 1 of 20 answers to his descriptive question
(e.g., UUU codes for serine, UUU codes for valine, UUU codes for phenylalanine).
In Bonner’s view, Nirenberg harbored no hypotheses and advanced no predictions about
which amino acid would be produced. Nirenberg simply wanted to know which of the 20
amino acids UUU codes. In other words, the fact that it turned out to be phenylalanine
was just the way it turned out and was no more or less theoretically significant than UUU
coding for valine, serine, or any other of the 20 some possibilities.
Based on Bonner’s view of Nirenberg’s research, it is surprising to learn how others at
the time responded when they learned of Nirenberg’s phenylalanine result. For example,
consider this response by Frances Crick contained in a paper published in Nature (Crick,
Barnett, Brenner, & Watts-Tobin, 1962):
At the recent Biochemical Congress in Moscow, the audience of Symposium I was startled
by the announcement of Nirenberg that he and Matthaei had produced polyphenylalanine
(that is a polypeptide all the residues of which are phenylalanine by adding polyuridic acid,
that is, an RNA the bases of which are all uracil) to a cell-free system which can synthesize
proteins. (p. 1232)
One has to wonder why the audience was “startled” to learn that a string of Us codes
for phenylalanine and not for say valine or serine. Perhaps, there is more to the story than
Bonner is acknowledging. Also consider Crick’s comment in a letter to Nirenberg dated
January 4, 1962: “The English papers have made rather a fuss about our Nature paper,
which was published on Saturday, but as far as I have stressed that it is your discovery
which was the real break-through.”
Crick’s breakthrough sentiment about Nirenberg’s research was echoed in two other
letters to Nirenberg. One letter from the famous French researcher Francois Jacob dated
December 20, 1961, had this to say: “Many thanks for your two manuscripts. It is a
wonderful story. All my congratulations.” The other letter from H. J. Muller of Indiana
University dated February 1, 1962, stated,
Let me express the thanks and appreciation of the Committee that arranged the recent
symposium on RNA coding for your kindness in having come here for the truly remarkable
contribution that you have made. It was inspiring to the older and to the younger hearers
alike to follow the course of the marvelous break-through that you described to us. (All
letters are online at http://profiles.nlm.nih.gov/)
Nirenberg’s colleagues were not the only ones startled and impressed by his “wonderful
story”—his “marvelous breakthrough.” The newspapers were also lauding Nirenberg’s
Science Education
348 LAWSON
achievement. Importantly, they placed it in the larger theoretical context of the day. Consider,
for example, the following paragraphs written in an article, titled “NIH Researchers Crack
the Genetic Code,” published in the Medical World News (January 5, 1962):
The enigma of genetic coding, considered a fundamental secret of life, may be on the verge
of solution. In just-published and about-to-be published papers, several research teams are
reporting experimental proof of what has been largely theory: the intricate process by which
structure and function of living organisms are shaped. One group has begun to crack the
DNA-RNA code—the key to the whole mystery. Soon they expect to decipher the entire set
of instructions by which genetic messengers direct the manufacture of proteins—the basic
stuff of life.
The major achievement in RNA research is the work of two young biochemists at the
National Institute of Arthritis and Metabolic Diseases, Drs. Marshall W. Nirenberg and J.
Heinrich Matthaei. Behind their work, however, is a whole series of investigations which
has produced the basic theory and its preliminary experimental support.
Fundamentally, the theory states that the hereditary “blueprints” of the cell structure and
function are coded within the cell nucleus as long-chain molecules of deoxyribonucleic
acid (DNA). These plans are transmitted, in a series of steps, to the cytoplasmic “assembly
line” where they direct the synthesis of each cell’s characteristic products. (p. 18, online at
http://profiles.nlm.nih.gov/)
If we assume that this is a relatively accurate account, then we can see why Nirenberg’s
result caused such a fuss. He not only answered Bonner’s narrow descriptive question
but also provided a key piece of evidence to help answer a much broader causal question,
namely: How does DNA code for the production of proteins? Importantly, by helping answer
this more fundamental theoretical question, Nirenberg had begun to “crack” the genetic
code—a breakthrough worthy of a Nobel Prize.
Thus, Bonner’s characterization of Nirenberg’s research as descriptive and exemplary of
Method B appears misleading. A more accurate interpretation is that Nirenberg was using
Method A. Consequently, his research can be better understood as a theory-driven attempt
to find out how the letters of DNA code for the production of proteins. To do so, Nirenberg
generated a theory claiming that (a) specific combinations of at least three of the four letters
of DNA first serve as a template for the production of RNA; (b) specific combinations of at
least three of the four letters of RNA then serve as a template for sequencing specific amino
acids; and (c) amino acids when strung together make proteins. Accordingly, Nirenberg’s
reasoning and his key deductive and inductive argument can be summarized similarly to
the previous cases of Galileo and the Grants, that is,
If . . . the above theory is correct, and . . . we conduct an experiment with RNA made only
of U’s (imagined test), then . . . a polypeptide molecule should be synthesized and it should
consist of only one type amino acid (predicted result via deduction). And . . . when Nirenberg
and Matthaei (1961) conducted the test, they found that a polypeptide chain consisting of
only one type amino acid (i.e., phenylalanine) was produced (observed result). Therefore
. . . support had been found for the theory5 (conclusion via induction).
5
When this If/then/Therefore characterization was read to Nirenberg during a telephone conversation,
he replied: “That’s exactly right.” Also consistent with Nirenberg’s use of Method A and his goal of theory
testing, he said that at the time he did not even know whether the message came from DNA or from RNA,
or for that matter if mRNA even existed (M. W. Nirenberg, personal communication, December 2005).
Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 349
In sum, the maxim that data should be gathered without guidance by antecedent hypotheses
about the connections among the facts under study is self-defeating, and is certainly not
followed in scientific inquiry. On the contrary, tentative hypotheses are needed to give
direction to scientific investigation. (p. 13)
A moment’s reflection reveals that data collection in the absence of a hypothesis has little
or no scientific value. Suppose, for example, that one day you decide to become a scientist
and having read a standard account of the scientific method you decide to collect some
data. Where should you begin? Should you start by cataloging all the items in your room,
measuring them, weighing them. . . ? Clearly there’s enough data in your room to keep you
busy for the rest of your life. (p. 191)
More recently, however, Mahootian and Eastman (in press) argue that the volume of ob-
servational data and the power of high-performance computing have increased by several
orders of magnitude and have reshaped the practice of science much in the way of Bonner’s
Method B. They advance what they call an observational-inductive (OI) approach to de-
scribe that new practice and to complement what they call the old hypothetico-deductive
(HD) approach. For example, one could now measure say 100 different variables and
use a high-powered computer to virtually instantaneously calculate correlation coefficients
among all 100 variables. Then without any prior hypotheses, one could sift through the
resulting coefficients to see which ones are relatively large (e.g., ≥0.80). Then upon finding
any such large coefficients, one could generate hypotheses to tentatively explain them, then
deduce predictions, and so on.
Of course, in theory, one could do this. But we can hear Hempel, Schick, and Vaughn
ask: Why would our imaginary scientist choose those 100 variables and not some other
100? Are there not prior conceptions (i.e., prior hypotheses/theories) involved in knowing
which variables to select and which to omit? If true, then Mahootian and Eastman are
not advancing a fundamentally different approach. Rather our imaginary scientist is still
using prior hypotheses/theories, albeit perhaps on a subconscious plane, to select which
variables to pay attention to and which ones to ignore. When the resulting coefficients are
Science Education
350 LAWSON
then calculated and observed, they may fit those prior conceptions or they may not. If they
do not, then the scientist has a new puzzling observation in need of explanation, which of
course would need to be tested via retroduction, deduction, and induction.
Alternatively, it is possible to imagine someone randomly picking 100 variables with
no prior conceptions about those variables, then using a computer to calculate correlation
coefficients, and so on. But this is unlikely to advance our collect scientific knowledge—at
least not very quickly. This is not to say that the OI approach could not be used. It simply
means that if used completely devoid of any guidance on which variables are selected,
the approach is unlikely to be productive. After all, one cannot “stand on the shoulders of
giants” if one cannot find the giants or if one lacks a ladder to climb up on.
Perhaps, another point is in order at this time. We are arguing that science begins
with puzzling observations. Certainly, the encounter with a puzzling observation is not
consciously planned. Recall that Galileo was simply using his new telescope to take a
“random walk” around the “heavens.” His initial observations were not designed to test a
hypothesis or a theory. But this does not mean that he did not have prior conceptions about
what he might see. After all, why was his notice of the three points of light near Jupiter
puzzling in the first place? The answer is that his immediate assimilation of those points of
light into his “fixed-stars” conception retroductively led to a contradiction, that is,
If . . . the three points of light near Jupiter are fixed stars, and . . . their sizes, brightness and
positions are compared to each other and to nearby fixed stars, then . . . variations in size,
brightness and position should be random, as is the case for other fixed stars. But . . . “they
seem to be arranged exactly in a straight line, parallel to the ecliptic, and to be brighter than
the rest of the stars.” Therefore . . . the fixed stars hypothesis is contradicted. Or as Galileo
put it, “yet they made me wonder somewhat.”
So the point is that all observations are hypothesis/conception driven (i.e., theory laden).
Those that are not puzzling are simply those that match our expectations (our predictions),
whereas those that are puzzling do not match and may, if attended to, eventually result in a
change (an accommodation) in those conceptions via If/then/Therefore reasoning.
It fits the traditional empirical model of hypothesis testing and therefore might apply less
well, for example, in terms of science conducted with archival data sets or observational
contexts such as certain subfields of geology. As a result, the framework is very specific in
terms of the scientific disciplines and contexts to which it applies, but for these disciplines
and contexts it provides a strong structural model to guide instruction and student reasoning.
(pp. 460 – 461)
Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 351
If . . . the environment hypothesis is correct, and . . . a group of people from the Bismarck
Archipelago settle in the large and relatively favorable environment of New Zealand,
while another group settles on the small and considerably less favorable environment of the
Chatham Islands 500 miles to the west (planned test), then . . . technological advances should
proceed faster and more fully in New Zealand than on the Chatham Islands. Additionally,
and if and when the two groups come in contact, the New Zealanders should dominate
the Chatham Islanders (deduced predictions). And . . . during the centuries following the
settlement of New Zealand and the Chatham Islands, the two groups of settlers did in
fact develop in opposite directions. The New Zealanders developed complex technology,
political organization, and intense farming practices while the Chatham Islanders reverted
to a loosely-coordinated hunting and gathering society. Further, in December 1835 when
500 armed men from New Zealand arrived on the Chatham Islands, they quickly killed
or enslaved the Chatham Islanders in spite of the fact that they were vastly outnumbered
(archival data). Therefore . . . the environment hypothesis is supported. Further, the innate-
intelligence hypothesis is contradicted because the identical ancestry of both groups of
settlers predicts similar developmental paths (conclusion).
If this account is reasonably accurate, we can conclude that the If/then/Therefore argu-
mentative form is general enough to encompass the use of archival data. Note, however,
whether this argument or any other If/then/Therefore argument should be considered retro-
ductive or deductive depends on when the thinker became aware of the relevant archival
data. Suppose, for example, Diamond generated the following argument before he was
aware of the events of 1835:
If . . . the environment hypothesis is correct, and . . . a group of people from a single location
settle in a large and relatively favorable environment, while another group from the same
location settle on a small and considerably less favorable environment (planned test), then
. . . technological advances should proceed faster and more fully in the first group than in
the second and when the two groups come in contact the first group should dominate the
second (deduced predictions).
Suppose Diamond next sifted through archival data to see whether he could find a
specific case in point. If this was the order of things, then Diamond clearly used a deductive
approach. If, however, Diamond first became aware of the 1835 killing and enslavement
of the Chatham Islanders by the New Zealanders and only then employed the environment
hypothesis to explain that “puzzling observation,” his argument would be retroductive. If
so, he should then deduce similar results and look elsewhere in the archival record to see
whether he can find them.
best example being Meteor Crater in Northern Arizona, as well as the mounting evidence
that the craters covering our moon and other planets were caused by impacting asteroids
and comets. More important, such impact craters are the rule, not the exception. Alvarez
was also aware of two published papers proposing that the dinosaur extinction had been
caused by radiation triggered by the explosion of a nearby star—a supernova.
By 1976, Alvarez began focusing his attention on the KT boundary layer (i.e., the narrow
boundary between the Cretaceous and Tertiary layers). He suspected that the boundary held
the key to the dinosaur extinction and that it could be used to test (via deduction) the more
global uniformitarian versus catastrophic theories. As he put it: “Very rapid deposition of
the clay would suggest a sudden cause for the extinction, but slow deposition would suggest
a gradual mechanism.” (Alvarez, 1997, p. 61)
How then could he find out how long it had taken to deposit the clay? What he needed
was something that had been deposited in the limestone and clay at a constant rate. At this
point, Alvarez enlisted the expertise of his father Luis Alvarez, a physicist at Berkeley. The
elder Alvarez knew that although meteors hit the Earth rarely and at random, meteorite dust,
which contains iridium, falls from outer space at a constant rate across the entire Earth.
Therefore, they came up with a way to indirectly measure the clay’s deposition rate by
measuring the amount of iridium. In other words,
If . . . the extinction of many foram species, and possibly the dinosaurs, was caused by a
catastrophic event (catastrophic-event hypothesis), and . . . the amount of iridium contained
in the clay at the KT boundary layer is measured (imagined test), then . . . a relatively small
amount of iridium should be present—about 0.1 parts per billion (ppb) (predicted result via
deduction). Iridium falls at a constant rate, thus the less iridium in the layer, the less time it
must have taken for deposition. And . . . thanks to Berkeley chemist Frank Asaro, by June of
1978 the initial iridium measurements had been made and they contained another surprise.
Instead of the expected amount of 0.1 ppb, assuming the clay layer had been deposited
slowly, a value of 9 ppb was detected (observed result).
Therefore . . . either the extinction of many foram species, and possibly the dinosaurs, was
not caused by a catastrophic event (conclusion via induction); or perhaps the catastrophic
event itself deposited the unusually large amount of iridium (alternative hypothesis).
Where had all the iridium come from? Possibilities quickly sprang to mind: Could it have
come from the supernova that Dale Russell and Wallace Tucker had suggested to explain the
dinosaur extinction? Did it come from an impacting asteroid or comet? Or could there be a
non-catastrophic explanation? Maybe the iridium was deposited from seawater somehow.
Or maybe the Earth had encountered a cloud of interstellar dust and gas. (Alvarez, 1997,
p. 69)
Before investing time and energy in testing these possibilities (i.e., alternative hypothe-
ses), Alvarez needed to know whether the iridium anomaly was restricted to the clay bed
around Gubbio or whether it was a global phenomenon. So he went to the library in search
of other known KT sites. At that time, the only other known site was a seaside cliff called
Stevns Klint in Denmark. Thus, Alvarez set off to visit the Stevns Klint deposits. And on
the basis of the following deductive/inductive argument, he concluded that what he found
there supported the catastrophic-event hypothesis:
If . . . the unusually large amount of iridium in the Gubbio clay layer was caused by a
global catastrophic event (catastrophic-event hypothesis), and . . . the amount of iridium
is measured in the other known KT boundary layer at Stevns Klint (imagined test), then
Science Education
354 LAWSON
. . . an unusually high level of iridium should also be found in that layer (predicted result via
deduction). And . . . when Alvarez visited the Stevns Klint deposits, he found that they also
contained a narrow clay layer with an unusually high concentration of iridium (observed
result). Therefore . . . the hypothesis was supported (conclusion via induction) and Alvarez
decided that it was time to think about a global explanation for the anomaly.
Thus, on the basis of this interpretation, we can conclude that the If/then/Therefore
pattern of argumentation is general enough to encompass at least some geological research.
We leave it to others to search for other geological cases that may or may not apply. Next let
us briefly consider the so-called “thought experiments” (i.e., instances in which an entire
“experiment” is conducted in one’s mind) to see whether the same argumentative form
applies.
Thought Experiments
As the name implies, thought experiments take place in one’s thoughts. But this does
not mean that they do not include observed results. They do. But the results of thought
experiments have been observed before the experiment has been mentally conducted.
Consequently, the point of a thought experiment is often to reveal via retroduction that the
hypothesis in question must be wrong. It must be wrong because it leads either to a prediction
that does not match with what we have already observed or to contradictory predictions. In
this sense, thought experiments can also be cast in the form of If/then/Therefore arguments,
which provides additional evidence of the form’s generality.
For example, Galileo conceived of one of the most famous thought experiments in
science. He wondered whether Aristotle’s claim that heavier objects fall faster than lighter
objects was correct. (Actually, heavier objects typically do fall faster than lighter objects
in air, but Galileo’s thought experiment was conducted in an idealized world devoid of
fall-resisting air molecules.) As you will see, Galileo’s retroductive reasoning led to the
conclusion that the mass must not matter because if it does, we end up with contradictory
predictions.
If . . . the rate of fall depends on the mass of the object, and . . . we drop a large, heavy
rock next to a smaller, lighter rock, then . . . the larger, heavier rock should hit the ground
first. Further, if . . . the rate of fall depends on the mass of the object, and . . . we now tie
the two rocks together and drop them, then . . . the larger, heavier rock should fall faster. It
should fall faster than before because it is now more massive (prediction). However, when
the rocks are tied together and are falling, the lighter, slower falling rock will produce a
drag on the heavier rock and slow it down. This implies that when tied together the rocks
should fall more slowly (contradictory prediction). Therefore . . . we have two contradictory
predictions implying that the rate of fall must not depend on the mass of the falling objects.
a case in point, let us consider what happened in 1900 when bicycle builders Orville and
Wilber Wright tried their hand at building an airplane (as cited in Crouch, 1992).
The Wright brothers began by planning to build a small, unmanned glider. To calculate
the size that the glider’s wings would need to develop the necessary lift for flight, they used
an equation called the lift equation. According to the lift equation, the amount of lift created
(L) depends on the total area of the lifting surface (S), the velocity of the flight squared
divided by 2 (V 2 /2), a coefficient of air pressure (k), and a coefficient of lift (C L ), that is,
kSV 2
L=
2CL
After using the lift equation to calculate the necessary specifications, the brothers built
their glider to these specifications, and in October of 1900 took it to Kitty Hawk, North
Carolina, for testing. The test results were encouraging enough to motivate them to build a
larger manned glider and try it the next year. During that next year, the larger manned glider
made several test flights—the longest 389 ft. However, the tests were largely discouraging
because the manned glider failed to attain the needed lift for eventual self-propelled flight
by some 20%. Thus, a puzzling observation provoked a causal question, namely: Why
did not the manned glider attain the needed lift? Presumably, based on the following
If/then/Therefore reasoning, they concluded that their failure to achieve the necessary lift
was due to faulty specifications used in the lift equation, that is,
If . . . the specifications used in the lift equation are correct, and . . . we build and fly a larger
manned glider to those specifications and determine its amount of lift, then . . . it should
achieve the needed lift for eventual self-propelled flight. But . . . when the manned glider
was taken to Kitty Hawk and tested it did not attain the necessary lift—by some 20%.
Therefore . . . the specifications are probably not correct.
But what in the specifications contained the error? The Wright brothers hypothesized
that the error likely existed in the coefficients used in their calculations (i.e., in the air
pressure coefficient, in the lift coefficient, or in both). They then figured out a way to test
their hypotheses by using a moving bicycle with a spare wheel, free to turn, mounted on its
handlebars.
To deduce the necessary prediction for their test, they used the lift equation and the two
previously used coefficients to calculate that a wing with a surface area of 1 square foot, set
at a 5◦ angle, should precisely balance a flat plate measuring 0.66 of a square foot, set at a
90◦ angle to the air flow. Consequently, to conduct the test, they mounted the spare wheel
on the handlebars; they fixed the 1 square foot wing on the front of the spare wheel at a 5◦
angle; they fixed the 0.66 square foot flat plate on the spare wheel at a 90◦ angle to the air
flow; and they rode the bicycle with its spare wheel, wing, and flat plate down the street.
Based on their calculations, the forces created on the wing and on the flat plate should
precisely balance each other and the spare wheel should not turn. However, when they rode
down the street, the wheel turned. So the error could not be in the surface areas, the actual
lift, or in the velocity. Therefore, they could be reasonably sure that the coefficients used
in the calculations were in fact to blame. Their reasoning can be summarized using the
If/then/Therefore form like this:
If . . . no error exists in the two coefficients, and . . . a bicycle with the spare wheel, a wing,
and a flat plate mounted as described above is ridden down the street, then . . . the forces
exerted by the wind on the wing and on the flat plate should precisely balance and the spare
Science Education
356 LAWSON
wheel should not turn. But . . . when the bicycle was ridden down the street, the spare wheel
turned. Therefore . . . the hypothesis is contradicted. In other words, an error must exist in
the coefficients.
So the Wright brothers set out to find out which coefficient was to blame. To do this,
they built a wind tunnel and used a small airfoil mounted on a balance to conduct several
additional hypothesis-driven experiments that soon led to the construction of the first
successful airplane in 1903.
Strong inference consists of applying the following steps to every problem in science,
formally and explicitly and regularly:
Although Platt noted that some research fields collectively embrace these steps, he also
noted that the steps were neither universally understood nor applied. Again in his words,
The difference between the average scientist’s informal methods and the methods of strong-
inference users is somewhat like the difference between a gasoline engine that fires occa-
sionally and one that fires in steady sequence. If our automobile engines were as erratic as
our deliberate intellectual efforts, most of us would not get home for supper. (p. 347)
Therefore, although it would seem incorrect to argue that all scientific research is
consciously guided by cycles of abductive, retroductive, deductive, and inductive infer-
ences, it might, nevertheless, be argued that the odds of success would improve if they
were consciously applied. Doing so, however, would require that researchers become more
aware of their reasoning in successful instances so that they are better able to repeat this
successful reasoning in subsequent instances. In cognitive terms, the consciousness issue
appears to be one of “metacognition”—a term coined by Flavell (1979). Metacognition
literally means thinking about one’s thinking, thus refers to an individual’s ability to stand
apart of his or her own thinking, reflect on, and subsequently improve one’s thinking. Thus,
in cognitive terms, what Platt is calling for is more reflectivity, more metacognition, on
the part of researchers. Increased reflectivity would presumably lead to a greater aware-
ness/consciousness of the reasoning process so that they might waste less time gathering
irrelevant data and instead would more quickly move to the explicit generation and retro-
ductive and then deductive tests of alternative hypotheses and predictions. This view is
consistent with more recent so-called “dual-processing” accounts of reasoning and social
cognition, which posit the existence of cognitive processes that are fast, automatic, and
unconscious and those that are slow, deliberate, and conscious (e.g., Evans, 2008).
Platt’s argument for raising consciousness among scientists can be applied in the sci-
ence classroom as well. In short, students need to engage in more lessons in which they
have opportunities to explore nature and confront puzzling observations and the resulting
causal questions. They then need a skilled teacher who allows and encourages them to
generate and test alternatives hypotheses and then reflect on what they have done, thus
“exercise” and become more conscious of their nascent inferential skills. Many science
educators have previously expressed a similar view with various degrees of explicitness.
For example, Berland and Reiser (2009) recently explored the usefulness of a framework
proposed by McNeill and Krajcik (2007). The McNeill–Krajcik framework contains these
three components: (1) Claim—the answer to the question, the piece to be defended by
evidence and reasoning; (2) Evidence—information or data that supports the claim; and (3)
Reasoning—a justification that shows why the data count as evidence to support the claim.
The NcNeill–Krajcik framework has elements in common with the present framework.
However, it lacks some of the present framework’s explicitness and completeness.
Also consider the Predict-Observe-Explain (POE) framework proposed by White and
Gunstone (1992). During POE instruction students are first asked to predict the outcome
of some sort of exploration or manipulation and then asked to justify their prediction. This
is usually done in an area in which they are likely to generate a false prediction based
on a misconception. Students then make the relevant observation, usually of a discrepant
event that contradicts their prediction. Finally, they are asked to explain the discrepancy
in an effort to change their misconception. Viewed in terms of the present theory, we can
interpret a student’s justification, their misconception, as an alternative hypothesis that
deductively generated their previously stated prediction. Thus, the subsequent observation,
which does not match their prediction, contradicts their hypothesis and leads to the need to
generate an alternative hypothesis (an alternative conception) that retroductively generates
Science Education
358 LAWSON
a prediction that matches what they have just observed. The only elements missing from
this POE framework (albeit three very important ones) are the need for students to then
(1) plan some new tests of the alternatives based on deduction, (2) conduct the tests and
compare the test results with their deductively derived predictions, and (3) use induction to
conclude that the alternatives have been supported or contradicted, thus replace their prior
misconception with a more scientifically acceptable one.
Specifically, when put into practice, either in scientific research or in the science class-
room, the present framework distinguishes among an argument’s declarative elements (i.e.,
puzzling observations, causal questions, hypotheses, planned tests, predictions, conducted
tests, results, and conclusions) and its procedural elements (i.e., abduction, retroduction,
deduction, and induction). Furthermore, the present framework details how the declara-
tive and procedural elements interact in the following manner: (1) An exploration phase
occurs in which a puzzling observation is made; (2) a causal question is raised; (3) a cre-
ative brainstorming phase occurs in which multiple hypotheses are abductively generated;
(4) next, a phase occurs in which tests are planned that retroductively and later deductively
lead to explicitly stated predictions; (5) evidence is then gathered that at least “in theory”
might contradict each hypothesis; (6) predictions and evidence are compared to allow, via
induction, the drawing of a conclusion; and (7) oral and/or written arguments are prepared
and presented that include the evidence and the If/then/Therefore reasoning for and against
each of the hypotheses.
Unfortunately, in terms of implementing such lessons, a recent survey (Oehrtman &
Lawson, 2008) found that a majority of experienced high school science teachers (63%)
were unaware of the distinction between hypotheses and predictions. Perhaps, even worse,
41% of them failed to distinguish evidence from conclusions. Such lack of awareness
is also common in instructional materials. For example, in a published set high school
physical science lessons, Hsu (2005) defines a hypothesis as “A sentence describing what
you think your experiment should demonstrate” (p. 9). And in a series of published high
school general science lessons, Cothron, Giese, and Rezba (2006) offer this definition
and example: “A hypothesis is a prediction of the effect that changes in the independent
variable will have on the dependent variable. One possible hypothesis would be: If the
amount of salt in the water is increased, then the water will evaporate more slowly”
(p. 45). Failing to differentiate hypotheses from predictions in this way not only loses
the “logic” of hypothesis testing but also loses the central goal of doing science, which
is to generate and test explanations. Small wonder so many teachers and students are
perplexed.
Indeed, many, if not most, published lessons fail to begin with puzzling observations
in need of explanation. For example, who among us has not seen a lesson similar to one
recently sent to me by a curriculum developer from a nearby school district? The lesson
begins with students observing different types of birdseed. Students are then asked to
generate a hypothesis about which type they think birds would prefer and then test their
hypothesis. Unfortunately, there is nothing to explain here—no puzzling observation and
no causal question. Consequently, there is no need for hypotheses. At best, the lesson
calls for students to make predictions about what type the birds might prefer. Most likely,
students will have no idea why, or even whether, birds might prefer one type over another.
Nevertheless, perhaps some accommodating student will make a prediction (recall White
and Gunstone’s POE instructional framework). Having done so, the alert teacher can then
ask the student to explain why the student made the prediction. If the student can then offer
a possible reason for the prediction (e.g., I think birds will prefer the type containing lots
of little yellow seeds because those seeds are easier to crack open), the teacher can identify
such a reason as a hypothesis, which could subsequently be tested.
Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 359
Thus, more lessons are needed that begin with students making puzzling observations
that can then be collaboratively and collectively explained via hypothesis generation and
test. For example, Lawson (2002b) conducted such a lesson with college students who
were challenged to generate and test multiple hypotheses about why water rose in a glass
inverted over a burning candle that was standing in a pan of water. A key aspect of this and
similar lessons is making sure that students freely generate several hypotheses. Initially,
however, many students are reluctant to generate a hypothesis for fear of being wrong.6
This is particularly so if the teacher allows hypotheses to be “critiqued” following their
generation. For example, students often generate the hypothesis that water rises because
oxygen is being consumed by the flame and the resulting vacuum “sucks” the water up. At
hearing such a hypothesis, an alert classmate having used retroductive reasoning may be
tempted to exclaim: “That cannot be right. If it were right, then the water should stop rising
after the candle goes out. But we saw that it doesn’t!” Or another equally alert classmate
may retroductively add: “That cannot be right. If we were right, then we would have
destroyed oxygen. But we know from chemistry class that combustion does not destroy
oxygen. Instead it converts it into carbon dioxide.”
Allowing these sorts of retroductive critiques during the hypothesis-generation phase of
instruction severely restricts the number of alternatives abducted. Consequently, following
good brainstorming techniques, retroductive arguments should be put on hold until students
have generated all of the hypotheses they can think of. Only then should the teacher
challenge students use both retroductive and deductive reasoning to test the alternatives.
Students should also be told to try to test all of the generated hypotheses, not just the ones
they think might be right. In short, to produce the strongest argument, their job should
be one of not only finding evidence in favor of one hypothesis but also finding evidence
against the alternatives. Teachers should also point out that the “correct” answer may be
some combination of the generated hypotheses, or perhaps a hypothesis that has yet to be
generated.
Following much sharing of ideas, much experimentation, and much argumentation,
some of the students who participated in the candle burning lesson described above were
successful in constructing verbal and then written If/then/Therefore arguments summarizing
how they had deductively tested each hypothesis and what conclusions they were able to
draw, for example:
If . . . the water rises because carbon dioxide molecules dissolve rapidly into the water
(hypothesis), and . . . the height of water rise in two containers is compared—one with CO2
saturated water and one with normal water (planned test), then . . . the water should rise less
in the container with the CO2 saturated water than in the container with the normal water
(prediction). But . . . the water rises is the same in both containers (result). Therefore . . . the
dissolving-CO2 hypothesis is probably wrong (conclusion).
Success in conducting such a test and in constructing such an argument implies that
these students reasoned in a context in which the hypothesized causal agent (dissolving
CO2 molecules) was nonperceptible. Furthermore, to link the imagined causal agent to the
experimental manipulation (i.e., the amount of dry ice in the two containers), the students
presumably had to understand a theoretical rationale that goes something like this:
6
To encourage multiple hypothesis generation, teachers need to ask divergent, rather than convergent,
questions. For example, students are much more willing to venture a “guess” if asked “What might have
caused the water to rise?” as opposed to “What caused the water to rise?”
Science Education
360 LAWSON
TABLE 2
Retroductive Arguments Constructed by Students While Attempting to
Test Hypotheses (from Lawson, 2002b)
Testing the dissolving-CO2 hypothesis
If . . . the oxygen is converted to carbon dioxide, and . . . the carbon dioxide dissolves
in the water, then . . . the inside pressure should be less than outside causing water
to rise. And . . . the water did rise. Therefore . . . the hypothesis is correct due to
rising of the water.
Testing the expanding-water hypothesis
If . . . water absorbs heat from the flame, and . . . that causes water to expand, then
. . . we should see the water rise. And . . . it does. Therefore . . . the hypothesis is not
disproved.
Testing the consumed-oxygen hypothesis
If . . . oxygen is consumed creating a partial vacuum, and . . . it causes a vacuum into
which the water is sucked, then . . . the water level should rise, which it does.
Therefore . . . the hypothesis is supported.
Testing the phlogiston hypothesis
If . . . the candle is lit before covering it with the jar, then . . . the water should rise
when the flame (phlogiston) goes out, and . . . the water did rise. Therefore . . . the
hypothesis is supported.
Dissolving CO2 molecules presumably cause a reduction of air pressure in the cylinder. This
reduction in turn causes the water rise. Consequently, when the water is already saturated
with CO2 molecules, the newly created CO2 molecules cannot escape into the water, hence
the internal pressure will not be reduced and the water will not rise.
The theoretical rationale in this case is used to link the imagined causal agent (i.e.,
dissolved CO2 molecules) to the manipulated (i.e., independent) variable in the experiment
(i.e., the amount of dry ice added to the two containers). Yet, several other students could
do no better than generate retroductive arguments such as those listed in Table 2. The argu-
ments in the table certainly suggest that these students failed to understand the limitations
of retroductive reasoning and failed to appreciate the need for deductive tests with clearly
stated predictions. Nevertheless, retroductive reasoning can be very important—recall the
retroductive nature of thought experiments. Also consider Albert Einstein’s general relativ-
ity theory, which in 1907 retroductively explained the puzzling 43 arcseconds per century
shift in Mercury’s orbit. Importantly, the theory also deductively predicted that starlight
passing the sun would be displaced outward by 1.7 arcseconds—a prediction that was sub-
sequently confirmed by astronomical observations made in 1919. When a graduate student
later asked Einstein what he would have done had the observations (made by Sir Arthur
Eddington) had shown his theory wrong, he replied: “Then I would have been sorry for the
dear Lord (referring to Eddington); the theory is correct” (Isaacson, 2007, p. 259).
Presumably, Einstein was speaking somewhat in jest. In fact, when subsequent obser-
vations made by Edwin Hubble in the 1920s contradicted another prediction of general
relativity theory (i.e., the universe is not expanding), Einstein was quick to modify the
theory to take Hubble’s result into account. Although in this instance, modification was
relatively easy because Einstein’s original version of the theory had in fact predicted an
expanding universe. But at the time the theory was generated, evidence implied a non-
expanding universe. Accordingly, Einstein added a “cosmological constant” to his field
equations to keep the theory consistent with a static universe. Later, Einstein would call
this addition “the biggest blunder he ever made in his life” (Isaacson, 2007, pp. 355–356).
Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 361
It was a huge blunder because had the constant not been added, the theory would have
been left predicting an expanding universe—a prediction that would have been confirmed
several years later making Einstein all the more famous.
Interestingly, when the students who constructed the retroductive arguments listed in
Table 2 were asked to generate and test hypotheses about the possible cause(s) of vari-
ation in the speed of a pendulum’s swing (i.e., What causes some pendulums to swing
faster than others?), they had no problem in doing so and in later constructing deductive
If/then/Therefore arguments like this one:
If . . . the amount of weight causes changes in swing rates, and . . . the weights are varied
while holding other possible causes constant, then . . . rate of pendulum swing should vary.
But . . . when we conducted the experiment, we found that the rates did not vary. Therefore
. . . the weight hypothesis is contradicted.
Although this pattern of argumentation is the same as that used to deductively test
the dissolving CO2 hypothesis, here a theoretical rationale is not needed because the test
involves an experiment in which the possible cause is directly manipulated. In other words,
the proposed cause is the amount of weight and the experiment’s independent variable
also is the amount of weight. Importantly, this variable can be easily manipulated because
weight differences can be sensed. Thus, causal hypothesis testing appears to occur on
two qualitatively different levels, with success at testing hypotheses involving perceptible
causal agents as a likely prerequisite for becoming proficient at testing hypotheses involving
nonperceptible theoretical entities. Thus, students may first become generally skilled at
testing hypotheses about perceptible causal agents. And, perhaps, only then, given the
necessary developmental conditions, do they become generally skilled at testing hypotheses
about nonperceptible causal agents (cf., Lawson et al., 2000).
Consequently, teachers should provoke students to construct, reflect on, and then try to
produce written arguments of what they have done in pendulum-like contexts before asking
them to do so in contexts in which the hypothesized causal agents are nonperceptible. For
example, consider the question that Sampson and Clark (2008) posed to students, that is,
Why do some objects, such as a metal and a wooden spoon, feel like they are at different
temperatures even though they have been sitting in the same room for several hours? Here,
at least two levels of responses are possible. The first level can be provoked by first asking
students to feel several objects and report which ones feel colder, warmer, and so on.
Upon doing so, students will report that some objects (i.e., metal ones) feel colder than
other objects (e.g., wooden ones). These observations raise a causal question: Why do
metal objects feel colder than wooden objects? Students can then generate some alternative
hypotheses: for example, metal objects feel colder because they are colder. They can test
this hypothesis by measuring the temperatures of the objects in question: If . . . metal
objects feel colder than wooden objects because they are colder, and . . . we measure the
temperatures of the metal and the wooden objects, then . . . the measured temperatures
of the metal objects should be lower. Of course the students’ results will contradict the
hypothesis: that is, but . . . the temperatures of the metal and wooden objects are the same.
Therefore . . . the hypothesis is contradicted. So the students will now have encountered a
real puzzling observation, namely, some objects feel colder than others in spite of the fact
that they are at the same temperature!
This puzzling observation raises a second, higher level, causal question, to which students
can again be asked to generate hypotheses. However, at least for the middle school students
interviewed by Sampson and Clark, the sorts of hypotheses needed here and their means
of testing are probably beyond their reach. Nevertheless, here is one hypothesis and a way
Science Education
362 LAWSON
to test it: If . . . metal objects feel colder than wooden objects at any given temperature
because the metals’ atoms are packed closer together—hence conduct heat better—hence
feel colder, and . . . we measure and compare the densities of the objects, then . . . the metal
objects should have greater densities than the wooden objects. Of course, upon measuring
and comparing densities, the students will find that, as predicted, the metals are denser.
Therefore . . . they can conclude that the hypothesis has been supported.
Unfortunately, developing many such hypothesis-driven, inquiry-based lessons and prop-
erly matching the lessons’ intellectual demands with the students’ initial reasoning skills
and their declarative knowledge remains an unmet educational challenge.7 A related un-
met challenge is educating teachers so that they (1) understand the underlying patterns of
reasoning and argumentation and (2) understand how best to teach such lessons so that
students become better able to abductively generate and then test alternative hypotheses
using retroduction, deduction, and induction.
The author thanks John Alcock for several helpful comments during the preparation of the manuscript.
Any opinions, findings, and conclusions or recommendations expressed in this publication are those
of the author and do not necessarily reflect the views of the National Science Foundation.
REFERENCES
Allchin, D. (2006). Lawson’s shoehorn—Reprise. Science & Education, 15, 113 – 120.
Alters, B. J. (1997). Whose nature of science? Journal of Research in Science Teaching, 34(1), 39 – 55.
Alvarez, W. (1997). T. rex and the crater of doom. Princeton, NJ: Princeton University Press.
American Association for the Advancement of Science. (1989). Project 2061: Science for all Americans.
Washington, DC: Author.
American Association for the Advancement of Science. (2007). Atlas of scientific literacy (Vol. 2). Washington,
DC: Author.
Bergman, M., & Paavola, S. (Eds.). (2003a). The Commens dictionary of Peirce’s terms. Retrieved from
http://www.helsinki.fi/science/commens/dictionary.html. (Reprinted from A letter to Calderoni, by C. S. Peirce,
1905)
Berland, K. K., & Reiser, B. J. (2009). Making sense of argumentation. Science Education, 93(1), 26 – 55.
Biela, A. (1993). Psychology of analogical inference. Stuttgart, Germany: S. Hirzel Verlag.
Bonner, J. J. (2005). Which scientific method should we teach & when? The American Biology Teacher, 67(5),
262 – 264.
Brannigan, A. (1981). The social basis of scientific discoveries. Cambridge, England: Cambridge University Press.
Collins, H. M. (1985). Changing order. London: Sage.
Cothron, J. H., Giese, R. N., & Rezba, R. J. (2006). Students and research. Dubuque, IA: Kendall/Hunt.
Crick, F. H. C., Barnett, F. R. S. L., Brenner, S., & Watts-Tobin, R. J. (1962). General nature of the genetic code
for proteins. Nature, 192, 1227 – 1232.
Crouch, T. D. (1992). Why Wilber and Orville? Some thoughts on the Wright brothers and the process of invention.
In R. J. Weber & D. N. Perkins (Eds.), Inventive minds (pp. 80 – 96). New York: Oxford University Press.
Darwin, C. (1898). The origin of species (7th ed.). New York: Appleton & Company.
Diamond, J. (1997). Guns, germs, and steel. New York: Norton.
Educational Policies Commission. (1961). The central purpose of American education. Washington, DC: National
Education Association of the United States.
7
In terms of learning cycle instruction, lessons in which students generate and deductively test alternative
hypotheses have been called hypothetico-deductive or hypothetical-predictive learning cycles (e.g., Lawson,
1995, 2009b; Lawson, Abraham, & Renner, 1989). Learning cycles in which students explore nature
and simply identify patterns and/or make puzzling observations without generating possible explanations
have been called descriptive learning cycles. And learning cycles in which students confront puzzling
observations and generate possible explanations, but test them only with previously gathered data, have
been called empirical-abductive (i.e., retroductive) learning cycles Thus, the three types of learning cycles
represent segments along a continuum from descriptive to experimental science. As such, they place
differing demands on student initiative, knowledge, and reasoning skill.
Science Education
BASIC INFERENCES OF SCIENTIFIC REASONING 363
Educational Policies Commission. (1966). Education and the spirit of science. Washington, DC: National Educa-
tion Association of the United States.
Erduran, S., Simon, S., & Osborne, J. (2004). TAPping into argumentation: Developments in the application of
Toulmin’s argument pattern for studying science discourse. Science Education, 88(6), 915 – 933.
Evans, J. S. (2008). Dual-processing accounts of reasoning, judgment and social cognition. Annual Review of
Psychology, 59, 255 – 278.
Finke, R. A., Ward, T. B., & Smith, S. M. (1992). Creative cognition: Theory research and practice. Cambridge,
MA: The MIT Press.
Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitive-developmental inquiry.
American Psychologist, 34, 306 – 326.
Gentner, D. (1989). The mechanisms of analogical learning. In S. Vosniadou & A. Ortony (Eds.), Similarity and
analogical reasoning. Cambridge, England: Cambridge University Press.
Giere, R. N., Bickle, J., & Mauldin, R. F. (2006). Understanding scientific reasoning (5th ed.). Belmont, CA:
Thomson Higher Education.
Grant, P. R. (1986). Ecology and evolution of Darwin’s finches. Princeton, NJ: Princeton University Press.
Grant, B. R., & Grant, P. R. (1989). Evolutionary dynamics of a natural population: The large cactus finch of the
Galapagos. Chicago: University of Chicago Press.
Hanson, N. R. (1958). Patterns of discovery. London: Cambridge University Press.
Hempel, C. (1966). Philosophy of natural science. Upper Saddle River, NJ: Prentice-Hall.
Holyoak, K. J. (2005). Analogy. In K. J. Holyoak & R. G. Morrison (Eds.), The Cambridge handbook of thinking
and reasoning (pp. 117 – 142). New York: Cambridge University Press.
Hsu. T. (2005). Foundations of physical science investigations (2nd ed.). Peabody, MA: CPO Science.
Isaacson, W. (2007). Einstein: His life and universe. New York: Simon & Schuster.
Koestler, A. (1964). The act of creation. London: Hutchinson.
Lawson, A. E. (1995). Science teaching and the development of thinking, Belmont, CA: Wadsworth.
Lawson, A. E. (2002a). What does Galileo’s discovery of Jupiter’s moons tell us about the process of scientific
discovery? Science & Education, 11(1), 1 – 24.
Lawson, A. E. (2002b). Sound and faulty arguments generated by pre-service biology teachers when testing
hypotheses involving un-observable entities. Journal of Research in Science Teaching, 39(3), 237 – 252.
Lawson, A. E. (2003). The nature and development of hypothetico-predictive argumentation with implications
for science teaching. International Journal of Science Education, 25(11), 1387 – 1408.
Lawson, A. E. (2004). T. rex, the crater of doom, and the nature of scientific discovery. Science & Education, 13,
155 – 177.
Lawson, A. E. (2005). What is the role of induction and deduction in reasoning and scientific inquiry? Journal of
Research in Science Teaching, 42(6), 716 – 740.
Lawson, A. E. (2006a). Allchin’s errors and misrepresentations and the H-D nature of science. Science Education,
90(2), 289 – 292.
Lawson, A. E. (2006b). Developing scientific reasoning patterns in college biology. In J. J. Mintzes & W. H.
Leonard (Eds.), Handbook of college science teaching: Theory, research, and practice (pp. 109 – 118).
Washington, DC: National Science Teachers Association.
Lawson, A. E. (2009a). On the hypothetico-deductive nature of science—Darwin’s finches. Science & Education,
18(1), 119 – 124.
Lawson, A. E. (2009b). Teaching inquiry science in middle and secondary schools. Thousand Oaks, CA:
Sage.
Lawson, A. E., Abraham, M. R., & Renner, J. W. (1989). A theory of instruction: Using the learning cycle to teach
science concepts and thinking skills. Cincinnati, OH: National Association for Research in Science Teaching.
Lawson, A. E., Clark, B., Cramer-Meldrum, E., Falconer, K. A., Kwon, Y. J., & Sequist, J. M. (2000). The
development of reasoning skills in college biology: Do two levels of general hypothesis-testing skills exist?
Journal of Research in Science Teaching, 37(1), 81 – 101.
Lawson, D. I., & Lawson, A. E. (1993). Neural principles of memory and a neural theory of analogical insight.
Journal of Research in Science Teaching, 30(10), 1327 – 1348.
Mahootian, F., & Eastman, T. E. (in press). Complimentary frameworks of scientific inquiry: Hypothetico-
deductive, hypothetico-inductive, and observational inductive. World Futures. The Journal of General Evolution.
McNeill, K. L., & Krajcik, J. (2007). Middle school students’ use of appropriate and inappropriate evidence in
writing scientific explanations. In M. C. Lovett & P. Shah (Eds.), Thinking with data: The proceedings of the
33rd Carnegie Symposium on Cognition (pp. 233 – 265). Mahwah, NJ: Erlbaum.
Misak, C. (2004). Charles Sanders Peirce (1839 – 1914). In C. Misak (Ed.), The Cambridge companion to Peirce.
Cambridge, England: Cambridge University Press.
National Research Council. (1990). Fulfilling the promise: Biology education in the nation’s schools. Washington
DC: National Academies Press.
Science Education
364 LAWSON
National Research Council. (1996). National Science Education Standards. Washington, DC: National Academies
Press.
National Research Council. (2001). Educating teachers of science, mathematics, and technology. Washington,
DC: National Academies Press.
Newton, P., Driver, R., & Osborne, J. (1999). The place of argumentation in the pedagogy of school science.
International Journal of Science Education, 21, 553 – 576.
Nirenberg, M. W., & Matthaei, J. H. (1961). The dependence of cell-free protein synthesis in E. coli upon naturally
occurring or synthetic polyribonucleotides. Proceedings of the National Academy of Sciences of the United
States of America, 47(10), 1580 – 1588.
Oehrtman, M., & Lawson, A. E. (2008). Connecting science and mathematics: The nature of proof and disproof
in science and mathematics. International Journal of Science and Mathematics Education, 6(2), 377 – 403.
Planck, M. (1949). Scientific autobiography (E. Guynor, Trans.). New York: Philosophical Library.
Platt, J. R. (1964). Strong inference. Science, 146, 347 – 353.
Polya, G. (1954). Patterns of plausible inference. Princeton, NJ: Princeton University Press.
Popper, K. (1965). Conjectures and refutations: The growth of scientific knowledge. New York: Basic Books.
Samarapungavan, A., Westby, E. L., & Bodner, G. M. (2006). Contextual epistemic development in science: A
comparison of chemistry students and research chemists. Science Education, 90(3), 468 – 495.
Sampson, V., & Clark, D. B. (2008). Assessment of the ways students generate arguments in science education:
Current perspectives and recommendations for future directions. Science Education, 92(3), 447 – 472.
Schick, T. S., Jr., & Vaughn, L. (1995). How to think about weird things. Mountain View, CA: Mayfield.
Shapley, H., Rapport, S., & Wright, H. (Eds.). (1954). A treasury of science. New York: Harper & Brothers.
(Reprinted from The sidereal messenger, by G. Galilei, 1610)
Sternberg, R. J., & Davidson, J. E. (Eds.) (1995). The nature of insight. Cambridge, MA: The MIT Press.
Tidman, P., & Kahane, H. (2003). Logic and philosophy (9th ed.). Belmont, CA: Wadsworth/Thomson.
Toulmin, S. (1969). The uses of argument. Cambridge, England: Cambridge University Press.
Turrisi, P. A. (Ed.). (1997). Pragmatism as a principle and method of right thinking. The 1903 Harvard lectures
on pragmatism. Albany: State University of New York Press. (Reprinted from C. S. Peirce, 1903; see also The
Commens dictionary of Peirce’s terms, by M. Bergman & S. Paavola, Eds., 2003a, 2003b. Retrieved May 18,
2009, from http://www.helsinki.fi/science/commens/dictionary.html)
Westerland, J., & Fairbanks, D. (2004). Gregor Mendel and “myth-conceptions.” Science Education, 88, 754 – 758.
White, R. & Gunstone, R. (1992). Probing understanding. London: Falmer Press.
Wivagg, D., & Allchin, D. (2002). The dogma of “the” scientific method. The American Biology Teacher, 64(9),
645 – 646.
Woodward, J., & Goodstein, D. (1996). Conduct, misconduct and the structure of science. American Scientist,
84, 479 – 490.
Science Education