You are on page 1of 34

JOURNAL OF RESEARCH IN SCIENCE TEACHING

PP. 938 - 962 (2000)

VOL. 37, NO. 9,

Setting Theoretical and Empirical Foundations for


Assessing Scientific Inquiry and Discovery in Educational
Programs

Paul Zachos,

Thomas L. Hick,

William E.J. Doane,

Cynthia Sargent

Troy High School, Troy, New York

New York State Education Department

Association for the Cooperative Advancement of Science and Education


4

Stillwater Central Schools, Stillwater, New York

Received 22 January 1999; accepted 12 June 2000


Abstract: This study was designed to develop measures of student competence
in conducting scientific inquiry. Two assessment techniques were developed. The
first measures Scientific Inquiry Capabilities, variables which are indicators of
diverse aspects of competence in conducting scientific inquiry. The second
measures Scientific Discovery, an indicator of success in attaining scientific
concepts as a result of direct investigations into natural phenomena. Thirty-two
high school students were presented with tasks requiring the building and testing
of logical- mathematical models of natural phenomena. The relationship between
each Scientific Inquiry Capability and success in making discoveries was tested.
Several Inquiry Capabilities were identified as strongly correlated with success
in Discovery, notably: Proportional Reasoning, the Coordination of Theory with
Evidence, and the Disposition to Search for Necessary Underlying Principles.
2000 John Wiley & Sons, Inc. J Res Sci Teach 37: 938 - 962, 2000

Many science education programs in the United States are adopting the
goal of developing competence in the conduct of scientific inquiry in
response to recent recommendations from national advisory panels
(American Association for the Advancement of Science, 1990, 1993;
National Research Council, 1996). However, there are no generally
recognized, systematic methods for assessing whether this goal has been
achieved (Doran, Lawrenz, & Helgeson, l994; Tamir, 1993). A validated
instrument for assessing students' competence in conducting scientific
inquiry would provide a basis for planning, monitoring, and facilitating
student learning. Such an instrument would also permit comparisons of
the relative efficacy of instructional programs dedicated to the
development of scientific inquiry capabilities. This paper sets
foundations for the development of such an instrument.

Correspondence to: P. Zachos


2000 John Wiley & Sons, Inc.

THEORETICAL AND EMPIRICAL FOUNDATIONS

93
9

One factor contributing to the absence of a generally recognized


method for assessing competence in conducting inquiry is the diversity
of ideas concerning what constitutes scientific inquiry. Therefore, we
begin by proposing and testing a method for empirically defining
competence in the conduct of scientific inquiry. It is a method that
puts no constraints on the existing diversity of ideas but rather
provides procedures for empirically evaluating proposed measures of
scientific inquiry. The logic underlying this method is as follows: if
the goal of scientific inquiry is the discovery of laws, principles,
rules etc. which account for natural phenomena (Popper, 1959), then
those dispositions, skills, and ways of reasoning that contribute to
success in making such discoveries can legitimately be considered
scientific inquiry capabilities. Thus, any proposed characteristic of
scientific inquiry can be validated by demonstrating its contribution to
success in making scientific discoveries. We carry this one step further
into the educational domain and suggest that those inquiry capabilities
which have a demonstrated relationship to the discovery of scientific
concepts have a strong claim to being included in a curriculum of
scientific inquiry. This reasoning provides an empirical basis for what
Schwab called ""grounds for a curriculum decision,'' (1966, p.
31).
Defining Critical Terms
The terms scientific inquiry and scientific discovery have many
different meanings. For this reason, a substantial part of the effort in
this research is directed to defining these terms for use in educational
programs and ultimately defining them operationally. The definitions are
intended to be faithful to the context of actual scientific research, to
the history and philosophy of science, and, at the same time, relevant
for science education practice. We hope to show that our definitions meet
these criteria, that they are not arbitrary stipulations but are based
on sound logical and empirical grounds.
'Science'-The Great Ambiguity
The term Science itself is ambiguous. On the one hand, Science refers
to the systematic processes by which we carry out investigations of the
world. On the other it refers to the findings (facts, laws, principles
etc.) that result from these investigations. This distinction is
regularly blurred in popular discussion. In the present study the
distinction is enhanced by using the term Scientific Inquiry to stand
for the processes of scientific investigation and Scientific Discovery
to stand for the growth of scientific conceptualizations of phenomena.
A Focus On Inquiry Capabilities-The Great Controversy
The distinction between science processes and science outcomes is
reflected in an ongoing controversy between those who believe that the
teaching of process skills should be primary in science education and
those who believe that teaching content should be primary (Norris, 1997;
Robinson, 1968; Schwab, 1969). A finer distinction can be made for the
domain of education revealing that there are, in fact, three broad
classes of goals in science education (DeBoer, 1991,
p.191; Welch, Klopfer, Aikenhead, &
Robinson, 1981). These are:
1. The development of intellectual skills
2. The understanding and appreciation of the methods and values of
3. The mastery of science facts, concepts, and
principles.

science

The present study deals exclusively with the attainment of the first
goal which traditionally has been referred to by terms such as: science
process skills, problem solving skills, the scientific method, scientific thinking, critical
thinking, and re.ftexive thinking. Here the development of these intellectual
characteristics will be referred to as the development of Scientific Inquiry
Capabilities. Two broad reasons are typically given for developing these
capabilities (Gauld, 1982):
1. Training for would-be scientists, and
2. Education for nonscientists in effective ways to deal with the

world.

The traditional focus in science education on the development of


scientific inquiry capabilities goes back at least to the beginning of
our century. Prominent examples can be found in the work of Dewey (l9l0,
l933), Conant (1947), Bruner (1960), Schwab (1960), Suchman (1961),
Gagne (1963), and various educational applications of the empirical
studies of Piaget and the Genevan school (Lawson, 1985). Two educational
programs that put heavy emphasis on the development of scientific
inquiry capabilities were Science: A Process Approach, AAAS (1967) and the
Biological Science Curriculum Study, Sarther, 1991.
Programs aimed at developing scientific inquiry capabilities have
been criticized (Hodson, 1990; Norris, 1997) for:
Vague educational objectives,
Poorly defined components,
e Absence of objective assessment,
e

and in general for being unable to substantiate claims for student


attainment. It is important to address these criticisms and the present
research does so by:
a) presenting procedures for empirically identifying and defining
scientific inquiry capabilities to be assessed,
b) presenting an assessment instrument that is composed of scales, the
highest levels
of
which are performance statements of educational
objectives.

The present research proceeds by defining scientific inquiry and


discovery, first theoretically based on a review of relevant literature,
and then empirically using operational definitions of these two
constructs.

Review of the Literature


What Is Scientific Discovery?
Two Aspects to Scientific Discovery-Building and Testing Concepts. In his book,
The Logic of Scientific Discovery (1959) Popper proposes a normative
description of what scientists do as a foundation for discourse on the
logic of scientific discovery;
. . . the work of the scientist consists in putting forth and testing theories

and again,

A scientist, whether theorist or experimenter, puts forward


statements, or systems of statements, and tests them step by step.
In the field of the empirical sciences, more particularly, he
constructs hypotheses, or systems of theories, and tests them
against experience by observation and experiment. (p. 27)

Popper's definitions suggest that there are two types of


activities involved in scientific discovery. We will refer to them as
"concept-building' and "concept-testing', using "concept' to stand
for the hypotheses and statements which become the rules, 1aws, and
principles of science when they are empirically confirmed. Conceptbuilding is the process of generating and proposing models,
formulation, and explanations concerning the nature of phenomena. It
requires inductive, creative, and imaginative activities (Nersessian
1999). By contrast Concept- testing is concerned with the evaluation
of hypotheses and theories through logical and empirical means. It
involves systematic hypothetical- deductive reasoning and the formal
application of what Popper calls ""methodological rules.'' (1959,
Chapter II).
The acceptance of the distinction between these two aspects of
scientific thought is wide- spread. Klahr and Dunbar (1988) refer to
them as ""hypothesis formation'' and ""experimental design.'' Holland and
his associates'' . . . construe the topic of scientific discovery . . . to
include processes by which laws and theories are initially conceived, as
well as processes by which their acceptance is justified,'' (Holland,
Holyoak, Nisbett, & Thagard, 1986, p. 320). Feynman (1993) refers to
them as ""imagination'' and ""experiment''. Wilson refers to them as
theory-building and theory-confirming. Langley, Simon, Bradshaw, and
Zytkow (1987) refer to them as discovery (""how to make the induction'')
and verification (""how to justify the induction once
it has been
made'') (p. 14). They point out that the history of science concentrates
on the building of theories which have constituted scientific
revolutions (see Kuhn, 1970) while the philosophy of science, typified
by Popper, concerns itself with the methods used in theory testing.
As Holland et al. (1986) point out, these deductive and inductive
sides of science interact (e.g., one may engage in informal predictions
and mental tests of a hypothesis while engaged in the process of
developing it). Yet, in essence, they are distinct-so much so that
Popper devotes his entire discussion to one and declares the other to be
outside of the domain of concern. In doing so, he presents a concise
characterization of the
distinction:
The initial stage, the act of conceiving or inventing a theory,
seems to me neither to call for logical analysis nor to be
susceptible of it. The question how it happens that a new idea
occurs to a man-whether it is a musical theme, a dramatic
conflict, or a scientific theory- may be of great interest to
empirical psychology; but it is irrelevant to the logical
analysis of scientific knowledge. This latter is concerned not
with questions of fact . . . but only with questions of
justification or validity. Its questions are of the following
kind. Can a statement be justified? And if so, how? Is it
testable? Is it logically dependent on certain other statements?
Or does it perhaps contradict them?
. . . Accordingly I shall distinguish sharply between the process of
conceiving a new idea, and the methods and results of examining it
logically. (p. 31)

A complete definition of Scientific Discovery then must encompass


both the building and testing of representations of classes of
phenomena. What makes these representations scientific is that they have
been tested experimentally and confirmed. The form that such
representations take, starting as early as Archimedes but not formalized

until the work of Galileo and Newton, has been a mathematical one. This
is expressed in a radical form by Neumann
(1963):

The sciences do not try to explain, they hardly even try to


interpret, they mainly make models. By a model is meant a
mathematical construct which with the addition of certain verbal
interpretations describes observed phenomena. The justification of
such a mathematical construct is solely and precisely that it is
expected to work. (p.
491)

Personal vs. Cultural Discoveries


The discoveries made by scientists are sometimes spoken of as
"historical', or as "cultural' discoveries. They constitute historical
occasions when new knowledge was established for the whole culture,
i.e., for humankind as a whole. The discoveries made by students that
are discussed in this research are not cultural discoveries. They may be
called "personal' discoveries but they are like those made by scientists
in that they involve the building and testing of concepts concerning
phenomena. Building and testing concepts which successfully account for
a phenomenon requires intuitive leaps, rigorous methods of design,
observation, measurement, and inference on the part of students, just as
it did for those individuals who engaged in the historical discovery. In
fact, one of the discovery tasks that we administer is a task that
Galileo posed to himself when researching the factors which effect the
period of a pendulum. In this paper the terms concept and conceptualization
are used to stand for the various forms of representation of natural
laws including hypotheses, models, rules, and principles, that
constitute the attainment of scientific knowledge. "Scientific
Discovery' then is the successful result of building and testing
conceptualizations of the phenomenal world through empirical inquiry.
Our use of the term discovery refers exclusively to the self-attained
grasp of a phenomenon through building and testing concepts as a result
of inquiry into that phenomenon. "Scientific Inquiry Capabilities' are
the means towards attaining that
grasp.
What Are Scientific Inquiry Capabilities?
In order to define scientific inquiry, a broad search was conducted
for human cognitive characteristics that could be construed as elements
of competence in conducting scientific inquiry and could at the same
time serve as appropriate goals for secondary school science education
programs.
Sources
included
the
works
of
natural
scientists,
philosophers, historians of science, and cognitive and educational
researchers. Another important source was the science education
literature. The aim of the search was to identify a broad range of
characteristics that might serve as the basis for constructing a battery
of competencies. To assure wideness of representation, two strategies
were adopted. One was to look for items of knowledge and dispositions
(e.g., attitudes, values, habits) as well as skills. The other was to
use a flow-chart model of the processes underlying scientific discovery
to assure that capabilities related to all phases of building and
testing concepts were represented in the final collection. The model
used to serve this latter purpose was Clement's (1989) model of
hypothesis development and model construction displayed in Figure 1.
Finally, advice from classroom science teachers who were consultants to
the research project was solicited. The search revealed a wide variety
of techniques, values, modes of reasoning, and dispositions that are
believed to characterize scientific inquiry.
The final collection of capabilities chosen for this study,
accompanied by citations to relevant sources, is presented in Figure 2.
There was no attempt to make the list exhaustive in this initial study.
Many
important
scientific
inquiry
capabilities,
for
example,
capabilities related to estimation in measurement (Jones & Rowsey, 1990;
Joram, Subrahmanyam, & Gelman, 1998), have not been included in the
initial collection. Our method, however, provides procedures
for

Figure 1. Cycle of conjecture, evaluation and modification or rejection in


hypothesis development and model construction. (Note. From ""Learning via Model
Construction and Criticism,'' by J. Clement, 1989, In G. Glover, R. Ronning, & C.
Reynolds (Eds.), Handbook of creativity: Assessment theory and research, p. 347.)

incorporating these and other inquiry capabilities into future research


where the strength of their contribution to success in the discovery of
scientific concepts can be
tested.
The collection displayed in Figure 2 is most appropriately thought of
as a list of ""Proposed Scientific Inquiry Capabilities'' awaiting
validation, i.e., awaiting a demonstrated relation to success in
Discovery, before receiving the formal designation of Scientific Inquiry
Capabilities. However, for the sake of brevity in discussion we will
refer to them simply as Scientific Inquiry Capabilities or for extreme
brevity as
""SIC's''.
Methods
This section provides the rationale and course of development of the
various procedures that were incorporated into our study for the purpose
of defining Scientific Inquiry and Scientific Discovery in a way that
the relationship between them could be empirically studied with
secondary school students.
Structured Inquiry
One critical element of our work is the interview method used to
elicit students' performance in both inquiry capabilities and concept
attainment. Piaget (1972) and Easley (1974) have both explained how
conventional ""tests'' fail to indicate students' maximum performance
in these areas. The ""clinical method/interview'' was developed to
address these failings; it provides a sensitive, interactive setting in
which to assess students' cognitive facilities and concepts (Ginsberg,
Kossan, Schwartz, & Swanson, 1983; Piaget, 1972; Posner & Gertzog,
1982).

Figure 2.

Proposed scientific inquiry capabilities.

The early use of the clinical method, however, was often limited by
exclusive reliance on verbal reports from students. When Piaget began
his collaboration with Barbel Inhelder, in studies such as The Growth of
Logical Thinking: from childhood to adolescence (1958), empirical ""tasks''
became a standard feature of his research. Participants' cognition was
studied in the presence of an object of inquiry common to both the
researcher and the participant. The virtue of having an empirical task
as the focus of a clinical interview is that it provides a common
referent for participant, researcher, and scientific audience. The
protocol then becomes a ""concurrent verbal report'' (Ericksson & Simon,
1996) referring to an objective event rather than an attempt to make
inferences simply from subjects'
verbalizations.
In order to maximize the authenticity and generalizability of our
findings we decided to use tasks in which participants worked directly
with natural phenomena instead of simulations. Working directly with
natural phenomena is often considered ""messy'', since too many
variables are involved to allow for conventional methods of control
needed
to
test highly specific hypotheses concerning student
performance. In our experience, however, it is this very messiness that
provides the rich environment necessary to elicit the full range of
scientific inquiry capabilities and the participants' attempts to grasp
and explore the parameters of the phenomenon. Capabilities such as
Concern for Precision of Measurement, the Search for Necessary
Underlying Principles, the application of Ratios and Proportions to the
task, Goal Oriented Observation, Consideration of the Relative Value of
Empirical Evidence and others require a living interaction with natural
phenomena to be elicited and
engaged.
Studies of scientific inquiry capabilities such as those of Kuhn,
Amsel and O'Loughlin (1988), which are composed of contrived tasks that
are carefully structured to test hypotheses about specific scientific
inquiry capabilities, while laudable in many respects, are necessarily
constrained in their generalizability because their focus is removed
from the complexity of concerns and activities that characterize genuine
scientific investigations. Furthermore, since participants in such
studies are given the concepts they are to work with, preformed by the
researcher, the opportunity for participants to build concepts is not
available.
One protocol of particular value was found in Lochhead's study, ""On
learning to balance perceptions by conceptions: a dialogue between two
science students,'' (1979). Lochhead presented a complete transcript of
a student inquiry into a natural phenomenon mediated and made manifest
through clinical interview. The conditions provide a rich context that
allows the participant to build and test concepts, and allows the
observer to study the emergence and engagement of the participant's
scientific inquiry capabilities and
conceptualizations.
Using Lochhead's study as a model, we developed an interview method
that we call
Structured Inquiry, which serves a number of functions:
1) Students are presented with a natural phenomenon and a task
related to that phenomenon, but without being provided with the
researcher/interviewer's con- ceptualization of the phenomenon.
2) The interviewer ensures that the student completely understands the
task.
3) The interviewer elicits the student's prior knowledge related to
the phenomenon as a baseline for assessing growth in
conceptualization.
4) The interviewer elicits student's hypotheses, methods of inquiry,
reasoning and knowledge, and dispositions related to the
phenomenon during the course of the inquiry session.

During a Structured Inquiry session, the interviewer or ""inquiry


guide'' first presents the task and then refrains from active
participation in the investigation or commentary on that investigation.
This is intended to prevent inadvertently giving task-relevant cues.
The
guide

poses questions to elicit student's comments regarding their inquiry


strategy (SICs) and levels of conceptualization, encourages the student
to ""think aloud'' (Ericksson & Simon, Chapter 2), but says nothing which
would suggest a conceptualization of the phenomenon or a scientific
inquiry capability that could be applied to the task.
Although participants' investigations are not supported directly by
the guide, the tasks themselves are constructed in such a way that
participants constantly receive feedback as to whether their
conceptualizations and strategies are working from their direct
interaction with the phenomenon. Whether they travel a productive path
or not, they are always faced with the consequences of their thought and
action. Moreover, the tasks (described below), although very simply
stated, are constructed so that the participants must build and test
logical- mathematical models of the phenomenon to succeed. In other
words, they must do what a scientist does in conceptualizing and
investigating a phenomenon.
Tasks
Three phenomena were identified to serve as a basis for studying the
ability to discover scientific concepts-Floating and Sinking, the Period
of Oscillation of a Pendulum, and Equilibrium on a Beam Balance. These
phenomena were chosen because they are almost universally part of high
school curricula and because their conceptualizations have great
generalization and application. Groundwork for charting levels of
conceptualization of these phenomena was set by Inhelder and Piaget in
their classic study of adolescent reasoning (1958). The primary
difference between our study and Inhelder and Piaget's is that we
present the task in a way that provides participants with the resources
and opportunity to test as well as to formulate concepts regarding the
phenomena. For example, the floating and sinking task as presented by
Inhelder and Piaget permits participants to formulate explanations for
floating and sinking behavior of objects in liquids but does not permit
them to conduct the necessary tests for the relationships of relative
densities which are central to a successful conceptualization of the
task. Our tasks thus provide opportunities for both the building and
testing of concepts. This, we believe, greatly increases the opportunity
for growth in conceptualization of the phenomena. In accord with Piaget
and Inhelder's developmental guidelines (1969) we worked with students
who were 14 years of age or over to assure that maturational development
was relatively complete, and consequently, that variation in performance
could be attributed more to social- educational than to physiologicaldevelopmental factors. The three tasks were administered as follows:
Task 1: Equilibrium on the Balance Beam. Participants were presented with a
wooden beam balance specially constructed for this study (Figure 3). On
each side of the beam there was a hook which could slide from close to
the fulcrum to the end of the beam on that side of the fulcrum. Sets of
labeled weights were provided that could hang on the hooks. The beam was
plain wood with no markings to indicate distance. Participants were
given two different weights (one of them already placed on the beam) and
told that their goal was to figure out where to place the second weight
so that the beam would be balanced. The task was presented using
unlabeled 40 g fishing weights, specially prepared so that they would
hang from the hooks.
In this task the participant is presented with the following
goal:
""Your goal is to figure out where to place any combination of
weights that I might give you so that the beam will stay
balanced.''

Figure ].

The beam balance and weights for the balance beam

task.

""For instance I may give you 5 weights to put on one side and 2
weights to put on the other.''

To perform successfully on this task, participants had to formulate a


rule encompassing a comparison of torques (each being a product of the
forces of mass times distance) and an adjustment of these torques in
order creates an equilibrium. Torque is an invisible variable. Although
it can be intuited physically, it must be formulated mathematically to
be conceptualized as a rule that would account for the
phenomenon.
Task 2: Floating and Sinking. Participants were presented with three 2liter beakers, each containing a different liquid-alcohol, water, and
salt water (see Figure 4). They were also presented with a cube and
asked to predict whether it would float or sink in each liquid. The cube
was one of a set of unmarked cubes specially constructed for this study
which systematically varied in volume and density, giving all the
components needed to confirm density relationships relevant to floating
and sinking in these conditions. Task administrators were given a
reference table that indicated the relative densities of the objects and
the
liquids.
In this task the participant is presented with the following
goal:
""Your goal is to figure out how to predict whether an object
will float or sink.'' ""Imagine that I am going to give you a
cube like one of these and ask you to predict correctly whether
it will float or sink.''

The relationship of two factors (volume and density) must be


established in order to realize this goal i.e., to formulate a rule that
will predict whether any given cube will float or sink. The
determination of whether an object will float or sink requires the
coordination and comparison of two densities-that of the object and that
of the liquid. Densities are ratios of mass and volume. Mass may be seen

as a phenomenal
coordination

attribute

but

volume

is

already

an

operation,

Figure 4.

Equipment for the floating and sinking

task.

of measurements. The comparison of these densities requires a


coordination of mathematical representations of forces underlying the
phenomenon that is several steps removed from simple phenomenal
attributes. In addition, an adequate conceptualization of this
phenomenon requires the formulation of a hypothetical construct which
has no observable correlate, i.e., the conceptualization of a volume of
liquid equal to the volume of the object whose buoyancy is under
consideration.
Task ]: The Period of the Pendulum. In this task a weight was suspended
from the end of a string attached to a hook on the top of an open
doorway (see Figure 5). The inquiry guide pulled back the weight and
released it. Participants were told that the resulting swinging object
is called a pendulum. They were asked to observe the time it took the
hanging weight to travel out and return to its point of origin. This
period of time was identified as "the period of the pendulum'.
Participants were presented with the following goal:

Figure 5.

Setup for the pendulum task.

""Your goal is to figure out how to construct a pendulum with a given


period.''
""For instance, imagine that I will ask you to construct a pendulum
which has a period of 3 seconds or some other number of
seconds.''

The key to realizing this goal is elucidating the role of a single


phenomenal variable-the length of the string. The identification of that
variable is typically reached only after a period of inference and
testing involving the competing explanations for cause of the period.
These altenative hypotheses may include the mass of the hung object, the
height from which it is held, and how hard it is pushed. The
relationship between length and period is nonlinear (the period varies
directly with the square root of the length). This requires either an
algebraic or a graphic solution.
Each of the three tasks was brought to a close after 1 hour of
inquiry.
The Setting-Contents of the 'Laboratories' for Structured Inquiry
Supplies, materials, and equipment for testing concepts on all three
tasks were combined and placed on a table in the room where the inquiry
sessions took place. Another table was available for the equipment
participants were using and as a place to keep notes. Participants were
told that if they needed anything additional to work on their task that
an effort would be made to obtain it for them. Some of the things that
participants asked for remained part of the ongoing set of materials and
equipment.
Support and reference materials for the three tasks are provided on
the
website.
Participants
Thirty-two paid volunteer participants (21 female, 11 male) from both
a rural and an urban high school in upstate New York participated in the
study. About half of the participants responded to a call from their
science teachers to take part in some ""fun'' science experiments
during their summer vacation. The other half were chosen by science
teachers to make the sample more diverse with respect to academic
aptitude and interest in science. One participant was 14 years old, five
were 15 years old, fourteen were 16 years old, and twelve were 17 years
old. None had previously taken a physics course in high school.
Participants were told ahead of time that that they would be paid $5 per
hour ($15 for all thee tasks) for participating.
Operational Definition of Scientific Discovery
In order to assess growth in conceptualization of the phenomena,
Discovery scales were developed. The Discovery scales are composed of
levels which represent increasingly adequate conceptualizations of
the phenomenon of interest. Each of the steps or levels on these
scales is called a ""Level of Conceptualization''(LOC). The lowest
Levels
of
Conceptualization
correspond
to
common
intuitions
concerning these phenomena, and are typically referred to as
""misconceptions'' or ""alternate conceptions'' of phenomena (Confrey,
1990; Helm & Novak, 1983; Novak, l987). The highest level of each
scale is a logical- mathematical formulation that adequately accounts
for the phenomena presented. This accords with Neumann's (1963)
characterization of the nature of scientific theories presented
earlier.
The ends of the scales are joined by intermediate levels which were
built on Inhelder and Piaget's (1958) developmental studies of
competence in conceptualizing the three phenomena of interest. A special

property of these scales


constitute an

is that

as

the

levels

grow

higher they

Figure 6.

Levels of Conceptualization on a sample Discovery

scale.

increasingly adequate basis for making predictions relating to the


phenomena. In other words, the levels of conceptualization are ordered
in terms of their relative power in accounting for the phenomenon of
interest. Because the higher levels of conceptualization provide a more
adequate accounting of the phenomenon under consideration, the ordering
is not dependent on the personal judgments of teachers, researchers or
other experts. A sample scale for assessing Discovery related to the
concept of torque on a beam balance is presented in Figure 6 (Levels of
Conceptualization on a Sample Discovery Scale).
Discovery was operationally defined as the growth from an Initial
Level
of
Conceptualization
(ILOC)
to
a
Final
Level
of
Conceptualization (FLOC). A special form of this definition for
statistical hypothesis testing by using multiple regression analysis
will be presented in the discussion on analysis below. However, simply
formulating a concept cannot be considered sufficient to constitute a
scientific discovery. The concepts have to be tested and verified.
Consequently, in accord with the notion that scientific discovery
incorporates both the building and the testing of concepts, two
constructs were developed-The Level of Conceptualization Stated by the
participant and the Level of Conceptualization Confirmed by the participant.
The Level of Conceptualization Confirmed is a level for which the
participant
is
able
to
provide
supporting
evidence
through
demonstration. Therefore, Scientific Discovery is more properly thought
of as growth in the Levels of Conceptualization Stated and Levels of
Conceptualization Confirmed (LOCs STAT & LOCs CONF) during the course of
inquiry into a phenomenon. The failure to take initial measures of Level
of Conceptualization Confirmed, however, precluded us from including
this variable in the subsequent regression analyses. The scales in
combination with their associated tasks as administered through the
process of Structured Inquiry constitute the operational definitions of
the Discovery construct for each task.
Operational Definition of Scientific Inquiry Capabilities
Performance scales were developed for 32 proposed characteristics of
scientific inquiry based on the review presented in Figure 2. The
guiding principle for development of the inquiry scales was to represent
a range of capability from novice to expert performance (Chi, Feltovich,
& Glaser, 1981; Chi, Glaser, & Rees, 1982). A sample scale for measuring
a Scientific Inquiry Capability is shown in Figure 7. The scales of
Scientific Inquiry Capabilities along with the procedures for eliciting
and rating performance on these scales, to be presented below, constitute

Figure 7.

A sample Inquiry scale.

the operational definitions of these capabilities. The highest levels of


these scales can be considered as educational objectives. Properly, we
should think of these constructs as measures of ""Proposed'' Scientific
Thinking Capabilities until their relationship to success in making
discoveries has been confirmed and replicated in empirical studies such
as the one described here.
Special Features of These Methods
These research procedures have a number of characteristics which we
believe to be of value to both cognitive research and science
education:
Participants are engaged in direct inquiry into a natural
phenomenon-they are not engaged in a simulation exercise.
e Participants must build and test concepts-none are given to them by
the researcher.
e The concepts needed to grasp the phenomenon are of great
generalizability.
e The participant has a living relationship to the phenomenon and
nature rather than one mediated by the researcher or by a
presupplied representation of the
phenomenon.
e The Inquiry Guide has a living relationship with the participant and
freedom to pursue
any line of questioning which elicits the Participant's concepts and
strategies without contaminating them through direction or suggestion.
e

Scoring Procedures
Raters. The structured inquiry sessions were videotaped. This provided
a rich record of the inquiry session that could be reviewed repeatedly
and used for both assessment and training. Judgements of participant
levels of conceptualization and participant levels of performance on
each of the inquiry capabilities were made, from the videotapes, by 13
different individuals who were referred to as Raters. Raters were former
or current physical science teachers at the secondary school level,
practicing scientists, a mathematics teacher, a philosopher, and a
doctoral level student of education.

Training. Prospective raters participated in a training session to


learn the procedures
for scoring participant performance. Each
rater was given a handbook which
contained:

(a) guidelines for the scoring process, (b) scales and scoring forms for
each of the Scientific Inquiry Capabilities, and (c) scales and scoring
forms for each of the three tasks. These resource materials are
available on our website.
The first step in training was an introduction to the purpose of the
research and a description of the scales and scoring procedures. This
was followed by several intensive sessions in which the trainer and
prospective raters watched videotapes of guided inquiry sessions that
demonstrated varying levels of competence on the three tasks. Whenever a
viewer identified a scorable moment (i.e., an instance of evidence
concerning the absence or presence of a Scientific Inquiry Capability or
a Level of Conceptualization of the phenomenon) the videotape was
stopped and ratings were recorded. The viewers then compared their
ratings (or lack thereof) and discussed discrepancies in rating. The
videotape segment could be replayed to question or confirm ratings. In
this way a general understanding of the meaning of the scales was
achieved. Typically, those portions of the video in which participants
were speaking contained scorable instances. Even periods of silence on
the videotape could contain scorable instances (e.g., SIC#12 Consults
Recorded Notes). Often there was a period of time when no new scoring
could take place because a stable Level of Conceptualization had been
reached and a particular Scientific Inquiry Capability was being
constantly applied. Such a period provided ample opportunities for
confirming the last rating that had been made. In some cases where
several minutes passed without scorable instances, the videotape was
stopped and the scoring guidelines were reviewed to see if an
opportunity for a scorable instance may have been
missed.
Scoring. Raters attended one or more training sessions for the purpose
of familiarizing themselves with the scoring process. Raters took video
copies of the inquiry sessions home with them to conduct analyses. They
were also given copies of notes made by participants in the course of
inquiry sessions. The trained raters then viewed the video tapes and
used the scales of Scientific Inquiry and Discovery to characterize the
performance of these participants on the inquiry tasks. They viewed
videotapes of the inquiry sessions, recorded initial and final levels of
conceptualization on each of the criterion variables, and the highest
level of attainment demonstrated on each of the scientific inquiry
abilities.
To eliminate inadvertent bias on the part of the raters, Levels of
Conceptualization and Scientific Inquiry Capabilities for each task were
always scored by different individuals. In addition, no rater scored
more than one LOC and one set of SIC's for the same participant. This
doubled the magnitude of the scoring task but greatly simplified the
task of the raters by allowing them to focus either upon scientific
inquiry processes or upon level of conceptualization but not both. Thus
no rater made judgments on the inquiry capabilities and levels of
conceptualization for a participant on the same task, and no rater made
judgments on more than one set of inquiry capabilities or one instance
of levels of conceptualization for the same participant. This procedure
protected against contamination of Discovery variable by knowledge of
Scientific Inquiry variables and vice versa. Preventing such
contamination is critical because the crux of the study was to establish
the nature of the relationship between these variables. Thus, raters
were assigned so that correlations between Scientific Inquiry
Capabilities and Levels of Conceptualization would be free of rater
bias.
Analysis
When scoring of the videotaped records of Structured Inquiries was
completed, an analysis procedure was applied to establish the extent of
the link between Discovery and competence in

conducting scientific inquiry, that is between growth in Levels of


Conceptualization and performance on the measures of Scientific Inquiry
Capabilities. This was achieved through an analysis of covariance using
regression models. Tests for the effects of each of the Scientific
Inquiry Capabilities on each of the Discovery variables were conducted
by means of hypotheses placed on regression models in which the Final
Level of Conceptualization served as the criterion. The Final LOC alone
does not constitute an indicator of Discovery, however, because it does
not take into account growth or change in the participant's
conceptualization over the course of the inquiry session. In order for
the analysis to have the property of representing Discovery, the
participant's Initial Level of Conceptualization is included in the
model as a control variable. The role of the ILOC as a covariate in the
models is to account for variation in the criterion that can be
attributed to the participant's conceptualization at the beginning of an
inquiry session. This entry-level conceptualization is construed as a
measure of prior knowledge of the phenomenon. Additional variables
included in the regression model must then account for the criterion
performance over and above what is accounted for by prior knowledge. An
inquiry variable that accounts for criterion variance in the Final Level
of Conceptualization, beyond what is already explained by the Initial
Level of Conceptualization, can be seen as accounting for growth or
change, or, in other words, as accounting for Discovery. All the tests
of relationships between proposed scientific inquiry abilities and
success in discovering scientific concepts followed this analysis model.
A detailed description of the analysis model is presented on the
website.
Results
Performance on the Discovery Tasks
Growth in Level of Conceptualization over the course of an inquiry
session can be illustrated by looking at the cross-tabulations of
Initial and Final Levels of Conceptualization for one of the tasks. As
shown in Table 1, twenty out of the thirty-two participants increased in
their Stated Level of Conceptualization for the Floating and Sinking
task. Table I presents results only from the Floating and Sinking task.
Similar findings characterize the other two
tasks.
While the study was underway our attention was called to the
possiblity that participants were growing in their ability to
conceptualize and apply the notion of density during the floating and
sinking task. They were apparently "discovering' the concept of density
because they needed it in order to succeed in the task. As evidence
supporting the hypothesis of personal discovery rather than something
recalled from instruction, it was noted that some participants
operationalized density as a ratio of volume to mass, rather than the
ratio of mass to volume (i.e., M/V) that is typically taught in school.
It seemed of value to consider the contribution of competence in
Scientific Inquiry Capabilities to this growth in conceptualization of
density as well. For this reason, we constructed two additional
Discovery scales-one for the concept of Density of Solid Objects (OBJ
DENS) and another for Density of Liquids (LIQ DENS). Videotape records
for each participant were re-examined to note the Initial and Final
Levels of Conceptualization of these variables and tests on the effects
of Scientific Inquiry Capabilities were conducted on each.
The patterns of growth in Level of Conceptualization from Initial to
Final Level are summarized in Table 2. For all but one participant, the
Final Levels of Conceptualization are equal to or greater than the
Initial Level of Conceptualization, indicating substantial growth in
Level of Conceptualization of the phenomenon, i.e., Discovery.

Table 1
Floating and Sinking Task; Cross tabulation of Initial and Final Levels of Conceptualization

Table 2
Growth in Level of Conceptualization on all tasks
Amount of change in
Task
Balance beam
Floating & sinking
Pendulum
Object density
Liquid density

1
0
0
1
0
0

LOC

Mean

4
12
10
18
25

5
10
9
6
2

5
2
6
1
0

5
4
4
0
1

10
4
0
2
1

2
0
2
3
2

1
0
0
2
1

2.69
1.31
1.31
1.34
0.78

32
32
32
32
32

Tests For Effects of Proposed Scientific Inquiry Abilities on Discovery


Hypotheses concerning the effects of individual Scientific Inquiry
Capabilities on Discovery were tested using least squares models of the
data as described in Introduction to Linear Models (Ward & Jennings,
1979). The models were composed of measures on three
variables:
1. Final Level of Conceptualization Stated for a given Discovery
2. Initial Level of Conceptualization on that task.
3. One of the proposed Scientific Inquiry Capabilities.

task.

A formal description of the model testing procedure is provided on the


website.
The regression models used to test the effects of inquiry
capabilities make use of the correlation between Initial and Final
Levels of Conceptualization. In the Floating and Sinking Task proper,
for example, this correlation is moderately high (r =.72). This means
that
52%

(r2 = .52) of the variance in the criterion can be considered accounted


for by the Initial Level of Conceptualization. As much as 48% of the
variation in the Level of Conceptualization Stated thus remains to be
accounted for. Separate tests were conducted for the effect of each of
the Scientific Inquiry Capabilities on this unaccounted-for portion of
variation in the Final Level of Conceptualization. This was done for
each of the five criterion variables.
Table 3 shows the results of the tests for effects of each of the
Scientific Inquiry Capabilities on each of2 the Discovery variables.
Asterisks are placed alongside significant r 2values. A single asterisk
alongside a table entry indicates that an r value of that magnitude
could have occurred strictly by chance about 5% of the time. In
actuality 28% of the hypotheses tested achieved significance at this
probability level (p < .05). A double asterisk alongside a table entry
signifies that the r2 value could have occurred strictly by chance about
1 time out of 100. In actuality 13% of the hypotheses tested achieved
significance at this probability level (p < .01). Similarly, 6% of the
hypotheses tested were significant at a level of.001 or less, that is at
a level that would be expected to occur by chance less than 1 time in a
thousand.
The first entry in Table 3 can be used as an example to illustrate
the interpretation of these results. The r 2 value for SIC #1 indicates
the effect of that Scientific Inquiry Capability (Uses Covariation as a
Basis for Inferring Causality) on Discovery in the Balance Beam Task.
The table shows an r2 value of .07. This suggests an effect of SIC #1
accounting for 7% of the Discovery Variance in the Balance Beam Task.
However, since the associated probability value is larger than .05, the
results are not significant and no effect is
established.
A review of the whole table shows that the disposition to Search for
a Necessary Underlying Principle (SIC#14) and the application of
Proportional Reasoning (SIC #27) were strong predictors of success on
all of the discovery tasks. These two capabilities have demonstrated a
close relationship to Discovery on all of the phenomena and tasks.
Coordinates Theory with Evidence (SIC #4) was able to account for
Discovery on three tasks. Formulates Composite Variables (SIC # 8) and
Identifies Sources of Error in Taking Measurements (SIC #19) were strong
predictors of success on the three Discovery variables in the Floating
and Sinking Task. The Recording of Observations (SIC #12) and the extent
of use of these recorded observations in the inquiry process (SIC #13)
were significantly related to Discovery on more than one task.
Dispositions such as Concern for Verification (SIC #22) and the Valuing
of Empirical Evidence (SIC #23) were related to Discovery on more than
one Discovery variable. SIC #22 showed significant effects on all three
tasks.
Although the r2 values may appear small it must be remembered that
they are squared correlations and so correspondingly smaller than the
original correlation values. Also they are partial correlations;
variance associated with the Initial Level of Conceptualization (i.e.,
variance attributable to prior knowledge) has already been removed for
the purpose of statistical control. These partial r 2 values represent
explanatory power, over and above what has already been accounted for by
the Initial Level of Conceptualization. Each is an indicator of the
magnitude of the explanatory effect of a Scientific Inquiry Capability
on Success in the Discovery of a Scientific Concept (i.e., the effect of
an SIC on the Final LOC with control for the Initial LOC).
Discussion
What does all this mean in more practical terms? For example, what
behaviors differentiate more successful students from those who are less
successful on the Discovery tasks? The key to answering these questions
is the nature of the performance on those Scientific Inquiry
Capabilities that were most strongly related to Discovery. This
performance is specified in
the

956

Table 3
Values of r2 for effects of Scientific Inquiry Capabilities on Discovery
SIC
BAL
SIC #1:
SIC #2:
SIC #3:
SIC #4:
SIC #5:
SIC #6:
SIC #7:
SIC #8:
SIC #9:
SIC
#10:
SIC
#11:
SIC
#12:
SIC
#13:
SIC
#14:
SIC
#15:
SIC
#16:
SIC
#17:
SIC
#18:
SIC
#19:
SIC
#20:
SIC
#21:
SIC
#22:
SIC
#23:
SIC
#24:
SIC
#25:
SIC
#26:
SIC
#27:
SIC
#28:
SIC
#29:

Uses covariation as a basis for inferring


causality
Uses absence of covariation as evidence of no
relationship
Varies factors to establish the presence or nature of
a
relationshiptheory with evidence
Coordinates
Generates and uses analogies in conceptualizing
phenomena
Applies physical intuition
Identifies potential causal factors and generates variables
to represent
factors
Formulates
composite
variables
Reasoning concerning extreme cases
Goal oriented observation
Records observations
Consults recorded notes
Innovation concerning task materials
Searches for a necessary underlying principle
Searches for parsimony
Internally consistent in explanation
Concerned for accuracy of language
Concerned with precision of measurement
Identifies sources of error in taking measurements
Uses techniques for precise measurement
Reaction to disconfirmation
Concerned for verification
Considers relative value of empirical evidence
Controls variables
Makes unsolicited predictions
Uses predictions to test hypotheses
Uses proportional reasoning
Classification
Seriation

* = Significant at .01<X<.05
** = Significant at .001<X<.01
*** = Significant at <X<.001

0.07
0.07
0.01
0.22**
0.03
0.01
0.01
0.03
0.00
0.06
0.13*
0.24**
0.04
0.28***
0.08
0.02
0.08
0.12*
0.03
0.20**
0.04
0.14*
0.09
0.01
0.03
0.05
0.29***
0.05
0.02

FLO

OBJ
DENS

0.04
0.06
0.07
0.06
0.00
0.03
0.08*
0.1l*
0.00
0.02
0.00
0.00
0.04
0.05
0.19*** 0.57***
0.00
0.01
0.05
0.23**
0.01
0.13*
0.01
0.10*
0.01
0.04
0.13**
0.22*
0.01
0.00
0.03
0.08
0.03
0.01
0.05
0.18**
0.14** 0.28***
0.00
0.09
0.00
0.00
0.02
0.12*
0.10**
0.11*
0.03
0.15*
0.03
0.11*
0.01
0.12*
0.08*
0.25***
0.01
0.10*
0.01
0.15*

LIQ
DENS
0.02
0.05
0.00
0.06
0.00
0.00
0.04
0.55***
0.03
0.14*
0.05
0.05
0.01
0.22**
0.00
0.05
0.01
0.12*
0.27***
0.05
0.01
0.07
0.09
0.07
0.06
0.09
0.24***
0.03
0.06

PEN
0.00
0.02
0.00
0.10
0.06
0.02
0.06
0.03
0.00
0.00
0.00
0.00
0.01
0.19**
0.16*
0.03
0.00
0.00
0.00
0.09
0.09
0.17*
0.01
0.00
0.00
0.00
0.22**
0.01
0.01

ZACHOS
ET

THEORETICAL AND EMPIRICAL FOUNDATIONS

95
7

levels of the SIC scales. Recall that the scales are ordered to
represent increasing competence with regard to some aspect of knowledge,
skill or disposition related to scientific inquiry. The lowest level
(Level 0) indicates that no appearance of the capability was found over
the course of the Structured Inquiry session. The higher levels
represent increasing movement towards expert levels of performance. In
many cases the highest level is recognized when the participant shows an
awareness of the nature and importance of that capability for scientific
inquiry as well as experiencing success in applying the capability.
While the complete set of scales is available on the website, it may be
valuable for illustrative purposes to consider performance on those
Scientific Inquiry Capabilities that were most strongly related to
Discovery:
SIC #14, The Search for a Necessary Underlying Principle.
Participants who scored highly were those who gave evidence that they
were seeking a rule, law or formula that could account for variations in
the phenomenon they were observing.
SIC # 27, Proportional Reasoning. Participants had to successfully
coordinate relationships between two or more ratios related to the task,
for example, to coordinate the ratio of mass and distance on one side of
the balance beam to the ratio of mass and distance on the other side.
SIC #4, Coordinating Theory with Evidence. High performance on this
scale required the participant to use data (evidence) to evaluate their
concept (theory), or to use their concept to evaluate the data that they
had collected. The highest level of performance on this scale required
that they give evidence of doing both.
SIC #8, Formulates Composite Variables. Participants had to create a
variable which is a function of two or more other variables (e.g.,
torque, volume, density) and use this variable to investigate the
phenomenon under consideration.
SIC #19, Identifies Sources of Error in Taking Measurements.
This scale
required
participants to spontaneously (i.e., without prompting by the inquiry
guide) suggest potential sources of error related to their measurements
(e.g., inconsistency in application of a measurement instrument,
inadequacies of the instrument, bias).
In general, students who were strong in the competencies measured by
the scales of Scientific Inquiry Capabilities were more successful in
making discoveries. All significant correlations between competency in
conducting scientific inquiry and growth in concept attainment were
positive.
Summary
We have presented methods for operationally defining Scientific
Inquiry Capabilities and Discovery for use in science education
programs. These methods include scales for assessing competence in
conducting disparate aspects of scientific inquiry and scales for
assessing progress in conceptualizing natural phenomena. A procedure
called Structured Inquiry was also presented. Structured Inquiry is a
method for administering tasks and eliciting participant performance
related to the targeted cognitive capabilities. This method has the
desirable features of facilitating a rich interaction between the
participant and the natural world, of eliciting the participant's
relevant cognitive characteristics, and of keeping the researcher's
judgment processes independent of the participant's.
A procedure was presented for empirically validating the proposed
Scientific Inquiry Capabilities against the criterion of Discovery using
multiple regression analysis. The procedure was able to identify a

95

ZACHOS ET AL.

8
number
of capabilities that were strongly related to Discovery, some on
more than one discovery task.
It was demonstrated that progress in personal discovery of important
scientific concepts,
that is, success in building and testing concepts related to these
phenomena, can be accomplished

and studied in relatively short periods of time. The results indicate


that secondary school students are capable of building and testing
conceptualizations (e.g., density) that often require substantial
periods of instruction in school settings.
It is hoped that the methods presented here will be of value to
historians, and philosophers, of science as well as to those interested
in developing competence in the conduct of scientific inquiry.
Future Research
There are several steps that should be taken to follow up on the
present
study:
1. Continued improvement in the quality of existing measures of
scientific inquiry and discovery, leading to greater ease and
flexibility
of
administration
and
greater
validity
and
reliability.
2. Development of additional measures of scientific inquiry and discovery.
3. A study to establish a causal link between scientific inquiry
capabilities and discovery. If instruction in specific scientific
inquiry capabilities can be shown to result in improved ability
to build and test concepts across a range of phenomena, there
would
be
increased
justification
for
developing
those
capabilities in educational settings. Establishing such a causal
relationship requires an experimental study.
4. Consideration of how Scientific Inquiry capabilities might
jointly contribute (e.g., interact) to effect discovery. This
could include looking for effects of SICs at the transitions
between specific Levels of Conceptualization on a Discovery
scale. This line of research requires continued theoretical
development
and
identification
of
appropriate
analytical
techniques for testing emerging hypotheses.
5. Use of the assessment of Scientific Inquiry Capabilities to
construct a profile of the student who is more successful in
making scientific discoveries.

Educational Implications
In designing this study we constrained ourselves to be in harmony
with the following principles in order to facilitate the applicability
of the research to teaching
situations:
The constructs used in the study were designed to be
understandable by and useful to teachers in classroom learning
situations.
e
The levels of competence on the scales of Scientific Inquiry
Capabilities were stated as
performance objectives so that they could translate immediately into
intended learning outcomes.
e The study was constructed so that participants would be learning as
their capabilities
were being assessed. In fact, participants were actively
constructing knowledge throughout the process of Structured
Inquiry.
e All phases of the study were reviewed by a panel of teachers who
regularly
made
suggestions which resulted in valuable improvements.
e

Through the present research, a method has been presented that can be
used to empirically justify the inclusion of specific capabilities in a

curriculum of scientific inquiry. The method includes procedures (e.g.,


scales, rating procedures, and ways to elicit student performance) that
can be used to assess the attainment of these capabilities. Furthermore,
the assessment procedures can be applied to the evaluation of
instructional programs that are aimed at developing these capabilities.

We have presented the prototype of instruments that assess competence


in various facets of scientific inquiry. In a more highly developed
form, we envision teachers who will become familiar with, in fact, who
will have internalized, the scales which represent levels of competence in the most important capabilities. With this capacity a teacher
could diagnose in any instructional setting, (classroom discussion,
student essays, laboratory work, homework assignments, etc.) a student's
level of attainment of a given capability-say, Using Co-variation as a
Basis for Inferring Causality. The teacher could then deliver an
appropriate instructional move for that student in context and at a
critical moment based on knowledge of that student's level of
competence.
Conclusion
It is our hope and goal that just as in our research the participant
has a living relationship with natural phenomena and the inquiry guide
strives to have a living relationship with the participant, so, in the
educational applications of this research, students will be helped to
have a living relationship with the natural world and a living
relationship with their science
teachers.
Supplementary materials including, data collection instruments, instructions
for task administration, scoring guides, raw data and data analysis procedures
are available online at
http://www.acase.org/jrst

References
American Association for the Advancement of Science. (1967). Science:
A process approach. Washington D.C.: AAAS.
American Association for the Advancement of Science. (1990). Science
for all Americans.
New York: Oxford University Press.
American Association for the Advancement of Science. (1993).
Benchmarks for science literacy. New York: Oxford University Press.
Bruner, J.S. (1960). The process of education. New York: Random
House.
Bruner, J.S., Goodnow, J., & Austin, G. (1956). A study of thinking.
New York: John Wiley & Sons.
Campbell, N.R. (1957). Foundations of Science. New York:
Dover.
Chi, M.T.H., Feltovich, P.J., & Glaser, R. (1981). Categorization and
representation of physics problems by experts and
novices. Cognitive
Science, 5,
121-152.
Chi, M.T.H., Glaser, R., & Rees, E. (1982). Expertise in problem
solving. In R.J. Sternberg (Ed.), Advances in the psychology of human
intelligence (Vol. 1, pp. 7 - 75). Hillsdale, NJ: Lawrence Erlbaum
Associates.
Clement, J. (1988). Observed methods for generating analogies in
scientific problem solving. Cognitive Science, 12, 563 - 586.
Clement, J. (1991). Informal reasoning in experts and in science
students: The use of
analogies, extreme cases, and physical intuition. In J. Voss, D. Perkins,
& J. Siegel (Eds.), Informal reasoning and education. Hillsdale, NJ:
Lawrence Erlbaum
Associates.
Clement, J. (1989). Learning via model construction and criticism. In
G. Glover, R. Ronning, & C. Reynolds (Eds.), Handbook of creativity:
Assessment, theory & research (pp. 341 - 381). New York: Plenum.

Conant, J.B. (1947). On understanding science. New York: Mentor.


Confrey, J. (1990). A review of the research on student conceptions
in mathematics, science, and programming. In C. Cazden (Ed.), Review of
research in education, (Vol. 16, pp. 3 - 55). Washington, D.C.: American
Education Research Association.

Dawson, C.J., & Rowell, J.A. (1986). All other things being equal: A
study of science graduates solving control of variables problems.
Research in Science and Technological Education, 4, 49 - 60.
DeBoer, G.E. (1991). A history of ideas in science education:
Implications for practice. New York: Teachers College Press.
Dewey, J. (1910). How we think. Lexington, MA: D.C. Heath.
Dewey, J. (1933). The process and product of reflective activity:
Psychological process and logical forms. In J. Boydston (Ed.), Later
Works of John Dewey pp. 171 - 186. 88, Carbondale: Southern Illinois
University Press.
Doran, R., Lawrenz, F., & Helgeson S. (1994). Research on assessment
in science. In E. Gabel (Ed.), Handbook of research on science teaching
and learning. New York:
Macmillan.
Easley, J.A. Jr. (1974). The structural paradigm in protocol
analysis. Journal of Research in Science Teaching, 11, 281 - 190.
Einstein, A. (1935). The world as we see it. London: John
Lane.
Ericksson, K.A., & Simon, H. (1996). Protocol analysis. Cambridge, MA:
The MIT Press. Feynman, R.P. (1993). Six easy pieces. New York:
Addison-Wesley.
Gagne, R.M. (1963). The learning requirements for enquiry. Journal
of Research in Science Teaching, 144 - 153.
Gauld, C. (1982). The scientific attitude and science education: A
critical
reappraisal.
Science Education, 66, 109 - 121.
Giancoli, D. (1986). The ideas of physics. San Diego, CA: Harcourt
Brace
Jovanovich.
Ginsberg, H.P., Kossan, N.E., Schwartz, R., & Swanson, D. (1983).
Protocol methods in research on mathematical thinking. In H. Ginsburg
(Ed.), The development of mathematical thinking (pp. 7 - 47). New York:
Academic Press.
Helm, H., & Novak, J. (Eds.), (1983). Proceedings of the
international seminar: Misconceptions in science and mathematics.
Ithaca, NY: Department of Education, Cornell University.
Hesse, M. (1966). Models and analogies in science. Notre Dame, IN:
University of Notre Dame Press.
Hodson, D. (1990). A critical look at practical work in school
science. SSR, 70 (256), 33 - 40.
Holland, J.H., Holyoak, K.J., Nisbett, R.E., & Thagard, P.R. (1986).
Induction: Processes of inference, learning, and discovery. Cambridge,
MA: MIT Press.
Inhelder, B., & Piaget, J. (1958). The growth of logical thinking
from childhood to adolescence. New York: Basic Books.
Jones, M.L., & Rowsey, R.E. (1990). The effects of immediate
achievement and retention of middle school students involved in a metric
unit designed to promote the development of estimating skills. Journal
of Research in Science Teaching, 27(9), 901 - 913.
Joram, E., Subrahmanyam, K., & Gelman, R. (l998). Measurement
estimation: Learning to map the route from number to quantity and back.
Review of Educational Research, Winter 68(4), 413 - 449.
Karplus, R., Pulos, S., & Stage, E. (1983). Proportional reasoning of
early adolescents. In.
R. Lesh & M. Landau (Eds.), Acquisition of mathematical concepts and
processes (pp. 45 - 90). Orlando, FL: Academic Press.

Klahr, D., & Dunbar, K. (1988). Dual space search during


scientific reasoning. Cognitive Science, 12, 1 - 48.
Klopfer, L.E. (1971). Evaluation of learning in science. In B. Bloom,
J.T. Hastings, & G.F. Madaus (Eds.), Handbook on formative and summative
evaluation of student learning. New York: McGraw Hill.

Kuhn, D., Amsel, E., & O'Loughlin, M. (l988). The development of


scientific thinking skills. New York: Academic Press.
Kuhn, T. (1961). The function of measurement in modern physical
sciences. ISIS, 11, 161 193.
Kuhn, T. (1970). The structure of scientific revolutions. Chicago: The
University of Chicago
Press.
Langley, P., Simon, H.A., Bradshaw, G.L., & Zytkow, J.M. (1987).
Scientific discovery: Computational explorations of the creative
process. Cambridge, MA: MIT
Press.
Lawson, A.E. (1985). A review of research on formal reasoning and
science teaching.
Journal of Research in Science Teaching, 22, 569 - 617.
Lawson A.E. (1990). Use of reasoning to a contradiction in grades
three to college. Journal of Research in Science Teaching, 27(6), 541 551.
Lesh, R., Post, T., & Behr, M. (1988). Proportional reasoning. In J.
Hiebert & M. Behr (Eds.), Number concepts and the middle grades, (Vol.
2, pp. 93 - 118). Reston, VA: National Council of Teachers of
Mathematics, Lawrence Earlbaum Associates.
Lochhead, J. (1979). On learning to balance perceptions by
conceptions: A dialogue between two science students. In J. Lochhead &
J. Clement (Eds.), Cognitive process instruction (pp. l47 - 178).
Philadelphia, PA: The Franklin Institute Press.
Lunetta, V.N., & Tamir, P. (1979, May). Matching lab activities with
teaching goals. The Science Teacher, 46, 22 - 24.
National Research Council. (1996). National science education
standards. National Academy of Sciences: National Academy Press.
Nersessian, N. (1999). Model based reasoning in conceptual change. In
L. Magnani, N.
Nersessian. & P. Thagard, (Eds.), Model-based reasoning in scientific
discovery. New York: Plenum
Neumann, J. von (1963). Method in the physical sciences. Collected
works. Oxford: Pergamon Press. Vol. 6, 491 - 498.
Noelting G. (1980). The development of proportional reasoning and the
ratio concept: Part I-differentiation of images. Educational Studies in
Mathematics, 11, 217 - 231.
Norris, S. (1984). Defining observational competence. Science
Education, 68, 129 - 142. Norris, S. (1985). The philosophical basis
of observation in science and science education.
Journal of Research in Science Teaching, 9, 817 - 833.
Norris, S. (1997). Intellectual independence for nonscientists and
other content-tran- scendent goals of science education. Science
Education, 239 - 258.
Norris, S., & King, R. (1984). Observation ability: Determining and
extending its presence.
Informal Logic, vi, 3 - 9.
Novak, J. (Ed.), (1987). Proceedings of the second international
seminar: Misconceptions in science and mathematics. Ithaca, NY:
Department of Education, Cornell
University.
Piaget, J. (1972). The childs conception of the world. Littlefeild:
Adams &
Co.

Piaget, J. (1980). Experiments in contradiction. Chicago, IL:


University of Chicago Press. Piaget, J., & Inhelder, B. (1969). The
psychology of the child. New York: Basic Books. Popper, K. (1959). The
Logic of scientific discovery. New York: Basic Books.
Posner, G., & Gertzog, W. (1982). The clinical interview and
measurement of conceptual change. Science Education, 66, 195 - 209.
Robinson, J.T. (1968). The nature of science and science teaching.
Belmont, CA: Wadsworth.
Sarther, C.M. (1991). Science curriculum and the BSCS revisited.
Teaching Education, 3, 2,
101 - 108.

Schwab, J. (1960). What do scientists do? Behavioral Science, 5, 1


- 27.
Schwab, J. (1966). The teaching of science as enquiry. In J. Schwab &
P Brandwine. (Eds.), The teaching of science, (pp. 1 - 103). Cambridge,
MA: Harvard University Press.
Schwab, J. (1969). Enquiry, the science teacher, and the educator. In
H.O. Andersen (Ed.), Readings in science education for the secondary
school. (pp. 181 - 191). New York: Macmillan. Suchman, J.R. (1961).
Inquiry training: Building skills for autonomous discovery. MerrillPalmer Quarterly, 147 - 169.
Tamir, P. (1993). Focus on assessment. Journal of Research in
Science Teaching, 30(6), 535 - 536.
Ward, J.H. Jr., & Jennings, E. (1979). Introduction to Linear Models.
Englewood Cliffs, NJ: Prentice Hall.
Welch, W.W., Klopfer, L.E., Aikenhead, G.S., & Robinson, J.T. (1981).
The role of inquiry in science education: Analysis and recommendations.
Science Education, 65, 33 - 50.
Wilson, V. (1987). Theory-building and theory-confirming observation
in science and science education. Journal of Research in Science
Teaching, 24, 279 - 284.
Wise M.N. (1995). The values of precision. Princeton, NJ: Princeton
University
Press.