You are on page 1of 24

International Journal of Science Education

ISSN: 0950-0693 (Print) 1464-5289 (Online) Journal homepage: https://www.tandfonline.com/loi/tsed20

The Effects of Inquiry Teaching on Student Science


Achievement and Attitudes: Evidence from
Propensity Score Analysis of PISA Data

Feng Jiang & William F. McComas

To cite this article: Feng Jiang & William F. McComas (2015) The Effects of Inquiry
Teaching on Student Science Achievement and Attitudes: Evidence from Propensity Score
Analysis of PISA Data, International Journal of Science Education, 37:3, 554-576, DOI:
10.1080/09500693.2014.1000426

To link to this article: https://doi.org/10.1080/09500693.2014.1000426

Published online: 13 Jan 2015.

Submit your article to this journal

Article views: 2068

View related articles

View Crossmark data

Citing articles: 41 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=tsed20
International Journal of Science Education, 2015
Vol. 37, No. 3, 554 – 576, http://dx.doi.org/10.1080/09500693.2014.1000426

The Effects of Inquiry Teaching on


Student Science Achievement and
Attitudes: Evidence from Propensity
Score Analysis of PISA Data
Feng Jianga∗ and William F. McComasb
a
Office of Innovation for Education, University of Arkansas, Fayetteville, AR, USA;
b
Department of Curriculum and Instruction, University of Arkansas, Fayetteville, AR,
USA

Gauging the effectiveness of specific teaching strategies remains a major topic of interest in science
education. Inquiry teaching among others has been supported by extensive research and
recommended by the National Science Education Standards. However, most of the empirical
evidence in support was collected in research settings rather than in normal school environments.
Propensity score analysis was performed within the marginal mean weighting through
stratification (MMW-S) approach to examine the effects of the level of openness of inquiry
teaching on student science achievement and attitudes with the Programme for International
Student Assessment (PISA) 2006 data. Weighting subjects on MMW-S weight successfully
balanced all treatment groups on all selected covariates. Significant effects were identified on
both cognitive and attitudinal outcomes. For student science achievement, the highest score was
achieved at Level 2 inquiry teaching, that is, students conduct activities and draw conclusions
from data. For student science attitudes, higher level of inquiry teaching resulted in higher
scores. The said conclusions were generally held in most PISA 2006 participating countries when
the analysis was performed in each country separately.

Keywords: Inquiry-based Teaching; Learning; Attitudes

Introduction
Gauging the effectiveness of applying any specific teaching strategy to meet a particu-
lar learning goal remains a major topic of interest in science education. For instance,


Corresponding author. Office of Innovation for Education, University of Arkansas, 309 WAAX,
Fayetteville, AR 72701, USA. Email: fjiang@uark.edu

# 2015 Taylor & Francis


Effects of Inquiry Teaching 555

given the continuing focus on the use of inquiry teaching, measuring its impact on
important educational outcomes would be an important goal. Inquiry teaching has
been supported by extensive research, is widely endorsed by the science education
community (e.g. Anderson, 2007; Treagust, 2007; Windschitl, 2008), and is rec-
ommended by the National Science Education Standards (National Research
Council, 1996). However, the rationale of inquiry teaching, particularly open
inquiry, has also been questioned by some educational psychologists (Kirschner,
Sweller, & Clark, 2006). Although the effectiveness of inquiry teaching has been sup-
ported by empirical studies (Minner, Levy, & Century, 2010), most of the evidence
was collected in research settings rather than in school environments. Jiang and
McComas’s (2012) study on the Programme for International Student Assessment
(PISA) data revealed an unexpected negative correlation between the use of student
investigations and student science achievement thus providing the impetus for the
present study.
The term inquiry teaching has been used to describe many dramatically different
teaching practices all which involve student decision-making to one degree or
another. However, it is wise not to assume all of the versions of inquiry would
make the same impact on student science learning and related attitudes. An important
distinction in inquiry teaching is between guided inquiry and more ‘open’ varieties.
While the distinctions between types of inquiry are well known, the impacts of
these distinct forms have not been carefully examined, particularly in real school set-
tings. Therefore, the purpose of the present study is to examine the effect of different
levels of inquiry teaching on student science achievement and attitudes with the PISA
data, through the lens of causal inference provided by the application of propensity
score analysis.

Review of Literature
Empirical Evidence Supporting Inquiry Teaching
Inquiry in science teaching plays at least two different roles. First, inquiry as a practice
engaged in by scientists should be taught as part of the science curriculum. Second,
and perhaps more importantly, inquiry should be used as a pedagogical tool
through which students can learn science content and practice through experiencing
the process of inquiry itself. The National Science Education Standards (National
Research Council, 1996) describes inquiry as a set of science practices:
Inquiry is a multifaceted activity that involves making observations; posing questions;
examining books and other sources of information to see what is already known; planning
investigations; reviewing what is already known in light of experimental evidence; using
tools to gather, analyze, and interpret data; proposing answers, explanations, and predic-
tions; and communicating the results. Inquiry requires identification of assumptions, use
of critical and logical thinking, and consideration of alternative explanations. (p. 23)

Formally, inquiry teaching was often connected to the concept of ‘discovery teaching.’
This was the case in Wise and Okey’s (1983) meta-analysis of science teaching
556 F. Jiang and W.F. McComas

strategies. Wise and Okey simply call inquiry teaching a ‘more student-centered and
less step-by step teacher directed learning experience’ (p. 421). Such an orientation is
reflective of the philosophy of constructivist educational theory, so it was not a sur-
prise to see Kirschner et al. (2006) consider inquiry teaching, discovery, and con-
structivism as essentially the same. Unfortunately, the label ‘minimally guided
instructional approach’, they stamped on such teaching strategies, is problematic
(Hmelo-Silver, Duncan, & Chinn, 2007). Inquiry teaching is a broad suite of strat-
egies, but Kirschner et al. (2006) made the broad claim that there is ‘overwhelming
and unambiguous evidence’ that ‘minimally guided instruction’, which includes
inquiry teaching, ‘is significantly less effective and efficient than guidance specifically
designed to support the cognitive processing necessary for learning’ (p. 76). Evidence
in favor of inquiry from meta-analyses on inquiry teaching does not support their
assertion, as we will see in this section.
Before reviewing the evidence, it is necessary to provide some clarification about
why some frequently cited meta-analyses regarding inquiry are not included in the
following review. First, although four meta-analyses (Bredderman, 1983; Shymansky,
Hedges, & Woodworth, 1990; Shymansky, Kyle Jr., & Alport, 1983; Weinstein,
Boulanger, & Walberg, 1982) comparing the so-called innovative curricula to tra-
ditional curricula have been repeatedly cited for supporting inquiry teaching, they
lack a clear definition of inquiry teaching or inquiry-based curriculum. Moreover,
even if these curricula were based on inquiry principles, there is no guarantee that
they would be used effectively in classrooms. Further, schools that adopt those curricula
could be systematically different from those using traditional curricula, while this selec-
tion bias was not handled in those meta-analytic studies. Second, although Lott (1983)
seemed to address inquiry teaching given the title of his meta-analysis, he considered
inquiry teaching the same as inductive teaching, a popular inaccurate perspective.
This is related to a common misconception about the nature of science, which mista-
kenly simplifies scientific inquiry as inductive reasoning. Finally, Wise and Okey
(1983) provided a meta-analysis for a variety of teaching strategies including inquiry,
but their analysis did not include the rigidness of the analyzed studies or detailed
information on inquiry teaching.
Fortunately, three more robust meta-analyses were published recently that are more
rigorous from both conceptual and methodological perspectives. These are con-
sidered as appropriate evidence for evaluating the impact of inquiry teaching.
Schroeder, Scott, Tolson, Huang, and Lee (2007) conducted a meta-analysis of US
research published from 1980 to 2004 with respect to the effect of eight science teach-
ing strategies, including inquiry teaching, on student achievement. Sixty-one studies
were analyzed and included based on the criteria that: they were carried out in the
USA, were experimental or quasi-experimental, and included effect size or the stat-
istics necessary to calculate effect size. Inquiry strategies were defined as those in
which ‘teachers use student-centered instruction that is less step-by-step and
teacher-directed than traditional instruction; students answer scientific research ques-
tions by analyzing data (e.g., using guided or facilitated inquiry activities, laboratory
inquiries)’ (p. 1446). In the sixty-one analyzed studies, 12 were focused on inquiry
Effects of Inquiry Teaching 557

strategies. The effect size of the inquiry strategies was 0.65. However, no more
detailed information is provided specifically for inquiry teaching, because the study
analyzed several teaching strategies.
Furtak, Seidel, Iverson, and Briggs’s (2009) meta-analysis focused on experimental
and quasi-experimental studies of K-12 inquiry-based classroom science teaching
published between 1996 and 2006. Based on the Duschl’s (2003) conceptual frame-
work of inquiry teaching, the authors defined a four-faceted model for inquiry which
includes conceptual, procedural, epistemic, and social facets. Nine studies from six
reports (with three replications) were included in their analysis. In the nine studies,
effect sizes ranged from 20.27 to 2.95, with two negatives and seven positives. The
researchers also performed specific analyses on relationships between effect size
with other factors, such as the four facets, grade level, duration of treatment, and
teacher-led with student-led conditions. Based on their findings, the researchers high-
lighted two factors that could increase positive effect size, including the emphasis of
the epistemic facets on inquiry and a longer exposure of students to inquiry
instruction.
Minner et al.’s (2010) Inquiry Synthesis Project synthesized findings from research
conducted between 1984 and 2002 to determine the impact of inquiry science
instruction on K-12 student outcomes. The research team first developed a concep-
tual framework for inquiry science instruction, consisting of three aspects: (1) the
presence of science content, (2) student engagement with science content, and
(3) components of instruction. The components of instruction were defined in a
two-dimensional table of specifications. The first dimension contains three instruc-
tional domains: student responsibility for learning, student active thinking, and
student motivation. The second dimension contains five inquiry components: ques-
tion, design, data, conclusion, and communication. In the syntheses, 138 studies
were included, with 73 non-experimental studies, 35 quasi-experimental studies,
and 30 experimental studies. Within the analyzed studies, 51% showed positive
impacts of inquiry teaching on student learning. The researchers concluded that
the evidence of positive effects of inquiry-based instruction was not conclusive, and
the association between the level of inquiry saturation and student learning
outcome was also modest. Concurrently, they argued that there was a clear and posi-
tive trend favoring inquiry-based instructional practices when instruction emphasized
student active thinking and drawing conclusions from data. The researchers also rated
studies based on their methodological rigor, but they found methodological rigor was
not significantly related to the effect of inquiry teaching on student outcomes.
Based on these three meta-analyses, it is reasonable to conclude that the effect of
inquiry teaching is no worse and perhaps somewhat better than conventional
science teaching. Nevertheless, positive research evidence does not always guarantee
successful practice. Although controlled random trial is considered as the ‘gold stan-
dard’ in methodology, the teaching strategy implemented in regular classrooms is not
necessarily the same as implemented in instructional experiments. In a preliminary
study, Jiang and McComas (2012) identified a significant negative correlation
between the use of student investigations and student science achievement. It is
558 F. Jiang and W.F. McComas

useful to examine whether this negative correlation constitutes a negative effect as


seen through the lens of causal inference.

Methodology
The purpose of this study was to examine the effects of levels of openness in inquiry
teaching on student science achievement and their attitudes toward science. Because
the study was conducted on non-experimental assessment data, to establish a valid
causal inference, the propensity score analysis was performed within the marginal
mean weighting through stratification (MMW-S) approach (Hong, 2012). The
specific research question answered in this study is: what are the effects of the level
of openness in inquiry teaching on student science achievement and attitudes.

Measurement
This study involved the use of the PISA data from 2006. PISA is a worldwide evalu-
ation of 15-year-old school students’ scholastic performance in three domains:
reading, mathematics, and science. The assessment was first administrated in 2000
and repeated every three years. In each PISA administration, all three domains
(reading, math, and science) are assessed, and one domain is emphasized through
the inclusion of an extensive set of questions. In 2006, the PISA assessment empha-
sized students’ science literacy with the inclusion of extensive survey items. Therefore,
this enabled analysis not possible with the more recent PISA 2009 and 2012 that did
not have the science focus. This study involved the use of three groups of variables:
(1) outcome variables regarding the measurement of students’ scientific achievement
and attitudes, (2) treatment variables regarding inquiry teaching activities, and
(3) covariates such as demographic information, social economic status, school
characteristics, and student scholastic aptitudes.

Science Learning Outcomes in PISA. Science learning outcomes were assessed for
both cognitive and affective elements (Organisation for Economic Co-operation
and Development, 2006). Science competency is the central construct in the PISA
assessment and was defined in three subscales: identifying scientific issues, explaining
phenomena scientifically, and using scientific evidence. However, we did not conduct
analysis at the subscale level because these subscales are so strongly correlated with
coefficients greater than 0.9 that analyses of different subscales provide almost the
same results. The attitudinal science learning outcomes include three components:
interest in science, support for scientific inquiry, and responsibility toward resources
and environments. The last attitude measure—responsibility towards resources and
environments—was not included in the analysis because it was not a focus of the
study. Both the cognitive and attitudinal outcomes were analyzed. Therefore, this
study had three outcome measures including science achievement (overall science
competency), interest in science, and support for scientific inquiry.
Effects of Inquiry Teaching 559

Levels of Openness in Inquiry Teaching. National Research Council’s (1996) definition


of inquiry includes a variety of science practices, but it does not address how they
should be implemented in science teaching. Should students carry on all the
science inquiry activities? Or what is the appropriate level of students’ involvement
in science inquiry activities? This issue of the level of openness of inquiry teaching,
or the amount of guidance provided to students in inquiry teaching, has been noted
by Schwab (1962), elaborated by Herron (1971). Shulman and Tamir (1973)
defined four levels of openness in inquiry teaching based on teacher and student invol-
vement in three inquiry components: concluding, designing, and questioning. We
added an inquiry component, conducting activities, to distinguish student hands-on
activities from teacher demonstrations, which are frequently used in classroom teach-
ing. Therefore, the four inquiry components in our framework are (1) conducting
activities, (2) drawing conclusions, (3) designing investigations, and (4) asking ques-
tions. They address four issues related to the level of openness in inquiry teaching, (a)
Do students conduct the activities by themselves? (b) Do students make conclusions
from data by themselves? (c) Do students design the investigation by themselves? and
(d) Do students raise the question for investigation by themselves?
Based on teacher and students’ involvement in the four inquiry components, we
defined five levels of openness in inquiry teaching (Table 1). More engagement
from students indicates the higher level of openness and less teacher guidance. A
new level, ‘Level 0’, was added to differentiate completely non-inquiry teaching
from the other levels of varying degrees of inquiry teaching. Therefore, our definition
of openness of inquiry teaching starts from the least open version where teacher does
everything, with gradual release of control at the more opened inquiry where students
are more fully involved in conducting activities, drawing conclusions, designing inves-
tigations, and asking questions which reaches the most opened inquiry teaching where
students do everything.
In addition to measuring students’ science achievement and attitudes, PISA 2006
also collected students’ background information regarding science learning and
teaching. The types of science teaching activities used in classroom were surveyed
in student questionnaire. In question 34 (When learning school science topics at
school, how often do the following activities occur?), students were asked how

Table 1. Levels of openness in inquiry teaching

Conducting Drawing Designing Asking


activities conclusions investigations questions

Level 0 T T T T
Level 1 S T T T
Level 2 S S T T
Level 3 S S S T
Level 4 S S S S

Note: S, given/conducted by student and T, given/conducted by teacher.


560 F. Jiang and W.F. McComas

frequently (never or hardly ever, in some lessons, in most lessons, in all lessons)
specific activities were used in science teaching. In the 17 items under this question,
seven of them were directly related to inquiry teaching.
In the PISA data set, the frequencies of using the teaching strategies were calibrated
with Rasch models and a score was generated for each teaching strategy. However, it is
difficult to interpret such scale scores because each scale includes several items and
each item addresses different levels of inquiry teaching, such as students make own
conclusion, design own investigations, or choose own topics. It is hard to tell
whether a higher scale score indicates the use of higher level of inquiry teaching or
a higher frequency in using low-level inquiry. Therefore, the scaled scores provided
by PISA were not used in the study. Rather, the levels of inquiry teaching were
defined based on our conceptual framework.
The seven items related to inquiry teaching and the corresponding targeted inquiry
components are shown in Table 2. In the seven items, item 1 and 2 target on conduct-
ing activities; item 3 targets on drawing conclusions; item 4 and 5 target on designing
investigations and item 6 and 7 target on asking questions. For each item, the fre-
quency values are coded as: 1 ¼ never or hardly ever, 2 ¼ in some lessons, 3 ¼ in
most lessons, and 4 ¼ in all lessons. For the targeted inquiry components which
contain two questionnaire items, the frequency value is obtained from the average
of the two. For example, the frequency value of conducting activities is the average
of items 1 and 2.
Based on the frequencies of the four inquiry components, we defined five levels of
inquiry teaching (see Table 3). The third level of frequency in the Likert scale, ‘in
most lessons’, is selected as the cutoff indicating sufficient implementation of each
inquiry component. At Level 0, none of the inquiry components is sufficiently
implemented; at Level 1, only conducting activities is sufficiently implemented, but
not drawing conclusions, designing investigations, or asking questions; at Level 2,

Table 2. Selected inquiry teaching survey items in PISA 2006 for defining the levels of inquiry
teaching

Item Statement Inquiry component

1 Students spend time in the laboratory doing practical experiments Conducting


activities
2 Students do experiments by following the instructions of the teacher Conducting
activities
3 Students are asked to draw conclusions from an experiment they have Drawing
conducted Conclusions
4 Students are required to design how a school science question could Designing
be investigated in the laboratory investigations
5 Students are allowed to design their own experiments Designing
investigations
6 Students are given the chance to choose their own investigations Asking questions
7 Students are asked to do an investigation to test out their own ideas Asking questions
Effects of Inquiry Teaching 561

Table 3. Five levels of inquiry teaching with the reported frequency values

Conducting Drawing Designing Asking


Level activities conclusions investigations questions

Level 0 ≤2 ≤2 ≤2 ≤2
Level 1 .2 ≤2 ≤2 ≤2
Level 2 .2 .2 ≤2 ≤2
Level 3 .2 .2 .2 ≤2
Level 4 .2 .2 .2 .2

Note: frequency: 1, never or hardly ever; 2, in some lessons; 3, in most lessons and 4, in all lessons.

only conducting activities and drawing conclusions are sufficiently implemented, but
not designing investigations or asking questions; at Level 3, conducting activities,
drawing conclusions, and designing investigations are sufficiently implemented, but
not asking questions and at Level 4, all the components are sufficiently implemented.
We expected that some of the student reports would not fit with the defined pattern.
For example, some students may report high frequency use of designing investigations
but low frequency of conducting activities. The mismatch between student reports
and the proposed pattern may have resulted from two reasons. First, some teachers
may not use inquiry teaching promptly; second, some students may not report teach-
ing activities correctly. No matter the reason, those cases should be excluded from the
analysis of the effect of different levels of inquiry teaching on learning outcomes.

Analysis
This study estimated the causal effect of the reported level of inquiry teaching on
student science achievement and their attitudes toward science. The levels of
inquiry teaching were considered as five categories of treatments in the causal analysis.
Previous analysis has found a negative correlation between student investigations and
science achievement (Jiang & McComas, 2012). In this case, the correlation does not
necessarily indicate causation because the treatments were not randomly assigned. As
an enhancement to the previous study to reduce selection bias and examine the causal
effect of inquiry teaching on science achievement and attitudes, the present study
involved the use of a propensity score analysis following the MMW-S approach
(Hong, 2012).
PISA 2006 is an international assessment administrated in 57 countries (or
regions), and the practice and context of science teaching and learning are likely to
be different among the participating countries. Therefore, our analyses were con-
ducted separately for the data from each country. In other words, all the analyses
treated each country as a standalone data set, and followed the same procedures of
propensity score analysis. However, by conducting the analyses in each country sep-
arately, we examined whether the impact of inquiry teaching on science achievement
and attitudes is consistent across participating countries.
562 F. Jiang and W.F. McComas

Propensity score analysis. Propensity score (Rosenbaum & Rubin, 1983) is the esti-
mated probability that a subject would be assigned to a particular treatment con-
dition (e.g. a particular level of inquiry teaching), given the subject’s
characteristics with respect to the covariates. By matching or weighting on pro-
pensity scores, selection bias can be significantly reduced and valid estimation
of causal effect can be established. Combined suggestions appeared in the
recent literature (Caliendo & Kopeinig, 2008; Frölich, 2004; Spreeuwenberg
et al., 2010; Stuart & Rubin, 2008), the causal analysis consisted of following
major steps:
(1) Select the covariates. Choosing a large set of covariates is beneficial in reducing
selection bias (Stuart & Rubin, 2008). The covariates were selected based on
theory and the results of previous empirical studies, rather than depending on
the correlations in the current data set (Caliendo & Kopeinig, 2008). The pool
of covariates included variables regarding students’ demographic and socioeco-
nomic status, the cognitive characteristics of students, the attitudinal character-
istics of students, and school characteristics (Fan & Nowell, 2011). In this
study, 33 variables (see Tables 7 and 8) were included in the multinomial logistic
regression model to estimate the probability of a student being assigned to a
specific inquiry level. As for missing data in categorical covariates, missing
values were replaced with indicators. Missing data in continuous covariates
were imputed through maximum likelihood estimation.
(2) Estimate the propensity scores. Because the treatment is a five-level categorical
variable, the generalized propensity scores were estimated with multinomial logis-
tic regression (Imbens, 2000; Lechner, 2001). One set of propensity scores was
generated for each treatment level.
(3) Identify the common support. The region of overlap on propensity scores across
different treatment groups was identified and subjects outside the region were
dropped. Overlapping histograms were used to examine the sufficiency of over-
lapping through the visualized approach.
(4) Stratify the subjects. The subclassification was performed for each set of propen-
sity scores separately by splitting the sample into strata based on the quantiles of
propensity scores. Although the numbers of strata are not necessarily the same
across different sets of propensity scores (Hong, 2012), we found that splitting
to six strata for all the five sets of propensity scores simplified the process and suf-
ficiently balanced the treatment groups.
(5) Calculate the marginal mean weights. The marginal mean weight for each
stratum was calculated after subclassification. The weighting values were used
as sample weight in the following analyses. For subjects assigned to stratum
ts in treatment group t, the MMW-S weight is (Hong, 2012) calculated as
follows:

Nt,s /Ns
Wt,s =
Nt /N
Effects of Inquiry Teaching 563

(6) Nt,s is the number of subjects in treatment group t in stratum s, Ns is the number
of subjects in stratum s, and Nt is the number of subjects assigned to treatment t.
To incorporate the complex sampling design of the PISA data, the final weight
was obtained by multiplying the sample weight with the MMW-S weight.
(7) Diagnose the balance. To establish reliable causal inference, it is necessary to
assess the balance among the treatment groups. The balance diagnostics involved
hypothesis testing on each covariate. Both weighted and unweighted analyses
were conducted to compare the balance before and after correction.
(8) Estimate the treatment effect. After a balanced sample was obtained, one-way
ANOVA analyses were conducted to determine the impact of inquiry teaching
on student science achievement and attitudes. The independent variable was
the treatment group, while the dependent variables were student science achieve-
ment, interest in science, and support for scientific inquiry. This study differs
from conventional analysis because selection bias was substantially reduced by
weighting subjects on MMW-S weight.
PISA statisticians used Item Response Theory (IRT) to estimate students’ math,
science, and reading scores. These IRT estimations provided five plausible values
(PVs) for each individual score. Therefore, in the PISA data set each student has
five PVs for math achievement score, five PVs for science, and five PVs for
reading. In this study, math and reading scores were used as covariates, while
science score were used as an outcome measure. Propensity score analyses were
run five times, and every time a set of the five PVs (denoted PV1 – PV5) was used.
For instance, the first run used PV1 math, PV1 reading, and PV1 science; and
then the second run used PV2 math, PV2 reading, and PV2 science, and so on.
After the five runs, all results from each run were averaged to obtain the final
results, as well as standard deviations were adjusted with the pooled standard
deviations.

Results
Propensity score analyses were conducted separately for each country. In the follow-
ing sections, we first present detailed analysis for the US data, and then present sum-
marized results for analyses from all participating countries.

Results from the US data


Within the total of 5,611 US subjects, 5,106 had no missing values on the selected
survey items used to determine the level of openness of inquiry teaching. According
to their response to the survey items and our definition of the levels of openness of
inquiry teaching (see Table 3), 3,739 students of those were categorized into five
groups. The other 1,367 students, whose reports do not fit with our defined levels
of inquiry teaching, were therefore excluded from the analysis.
564 F. Jiang and W.F. McComas

A multinomial logistic regression model was implemented to estimate the probabil-


ities of each of the five treatment conditions given the selected covariates for the 3,739
students in the sample. Every student has five estimated conditional probabilities cor-
responding to the five treatment conditions given the student’s observed covariate
values. Figure 1 displays the between-treatment-group comparison of the distribution
of the five logit propensity scores. The vertical lines in the graph represent the lower
and upper bounds of the common support across five treatment groups. In the 3,739
students, 3,410 students whose five logit propensity scores all fall between the bounds

Figure 1. Common support for evaluating five-group treatments


Note: LPScore1 –5 are the five sets of logit propensity scores
Effects of Inquiry Teaching 565

were kept for further analysis, while the other 329 students, roughly 9%, were
excluded for lack of common support. Those 3,410 students constitute the analytic
sample of the causal analysis in the study.
While it might be surprising, approximately 30% of the students reported the
highest level of openness in inquiry teaching, which means around 30% US stu-
dents, mostly 10th graders, have a good chance to select their own investigation
topics in science lessons. On the other hand, around 19% of the students reported
the lowest level of openness in inquiry teaching, which means they sit and watch
their teachers during most science lessons. The percentages of students receiving
other levels of inquiry teaching are 9% in Level 1 (i.e. students conduct hands-on
activities), 26% in Level 2 (i.e. students conclude data), and 15% in Level 3 (stu-
dents design investigation). As shown in Table 4, the adjustment for common
support only has a marginal impact on the percent of subjects in each treatment
group.
Based on the estimated logit propensity scores, six equal strata were created for each
of the five sets of logit propensity scores. Table 5 displays, for each treatment group,
the number of students in each stratum and the corresponding MMW-S weight of
student in that stratum.

Assessment of balance. Table 6 compares the between-treatment-group differences in


the distribution of each of the five logit propensity scores before and after weighting.
Results from a one-way ANOVA show that the pretreatment differences among the
five treatment groups were eliminated after weighting.
To check the extent by which the five treatment groups differed on each selected
covariate, ANOVA analyses were conducted for continuous covariates, and Chi-
square tests were conducted for categorical covariates. Before weighting, the five treat-
ment groups show statistically significant differences on 8 out of 13 continuous cov-
ariates (Table 7) and 13 out of 20 categorical covariates (Table 8). However, the five
treatment groups show no significant difference on any covariate after weighting on
the MMW-S weight. This indicates that the weighting technique satisfactorily
balanced the distributions of all the selected covariates. The balanced sample also
implies that further analysis on the casual effects is possible.

Table 4. Number of subjects in each level of the openness of inquiry teaching

Original sample (N ¼ 3,739) Analytic sample (N ¼ 3,410)

Level n Percent n Percent

Level 0 715 19.1 647 19.0


Level 1 323 8.6 317 9.3
Level 2 940 25.1 898 26.3
Level 3 543 14.5 511 15.0
Level 4 1,218 32.6 1,037 30.4
566 F. Jiang and W.F. McComas

Table 5. Computation of MMW-S for five treatment groups

Level 0 Level 1 Level 2 Level 3 Level 4

Stratum N Weight N Weight N Weight N Weight N Weight

1 27 3.33 11 2.76 26 4.22 25 2.71 72 2.56


2 55 2.03 29 1.82 75 1.98 57 1.44 120 1.67
3 82 1.39 47 1.12 115 1.29 75 1.15 135 1.47
4 93 1.24 57 0.94 163 0.93 94 0.93 201 0.98
5 156 0.74 73 0.72 211 0.72 125 0.70 240 0.79
6 234 0.46 100 0.53 308 0.48 135 0.63 269 0.53
Total 647 317 898 511 1,037

Table 6. Between-treatment-group differences in logit propensity score before and after weighting

Before weighting After weighting


Logit of propensity for
receiving inquiry teaching F p F p

Receiving Level 0 inquiry teaching 88.64 ,.01 2.14 .07


Receiving Level 1 inquiry teaching 36.44 ,.01 0.60 .66
Receiving Level 2 inquiry teaching 132.86 ,.01 0.61 .66
Receiving Level 3 inquiry teaching 35.77 ,.01 0.44 .78
Receiving Level 4 inquiry teaching 122.16 ,.01 0.68 .60

Estimation of causal effects. ANOVA analyses were used to examine the effects of the
level of inquiry teaching on student science achievement and attitudes (Table 9). It
was shown that the level of inquiry teaching has a statistically significant impact on
the three outcome measures (student science achievement, interest in science, and
support for scientific inquiry).
What is more interesting than the omnibus test of the significance of the causal
effects is the pattern of change between the groups. Student science achievement
reached the highest point with Level 2 inquiry teaching (see Figure 2), which involves
students in frequent conducting activities and drawing conclusions but not designing
investigations or asking questions. This result is in accordance with the findings from
Minner et al.’s (2010) meta-analysis which favored the concluding component of
inquiry teaching. It is particularly interesting that the highest level of openness of
inquiry teaching, which involves students in frequently asking questions and other
inquiry teaching components, results in the lowest science achievement in students.
The difference between the highest achievement group and the lowest achievement
group is about 11% of the standard deviation of student science achievement.
On the other hand, increasing the level of inquiry teaching showed a consistently
positive effect on the two attitudinal measures, student interest in science, and
support for scientific inquiry (Figures 3 and 4). The differences between the
highest group and the lowest group are about 39% and 35% of a standard deviation
of the outcome, respectively.
Effects of Inquiry Teaching 567

Table 7. Differences on continuous covariates among the treatment groups before and after
weighting

Before After
weighting weighting

Covariate F p F p

Student variables
ESCS (index of economic, social and cultural status) 16.12 ,.01 0.30 .88
Home educational resources 20.50 ,.01 1.12 .35
Family wealth 13.12 ,.01 0.41 .80
Science courses taken in the last year 82.25 ,.01 0.40 .81
Science courses taken in this year 18.28 ,.01 0.25 .91
Math achievement 15.90 ,.01 0.88 .48
School variables
Teacher –student ratio 2.61 .03 0.40 .81
Size of English classes 3.81 ,.01 0.56 .70
Proportion of certified teachers 2.08 .08 0.07 .99
School size 3.05 .02 0.57 .68
Percent of students receiving free/reduced lunch 0.41 .80 0.91 .46
Quality of educational resources 1.73 .14 0.11 .98
Activities to promote the learning of science 0.13 .97 0.74 .56

Results from All the Participating Countries: A Brief Report


The previous section presented detailed results of the propensity score analysis based
on the US data. However, the practice and context of science teaching and learning
could be dramatically different across different countries, so we conduct the same
analyses in other participating countries of PISA 2006 separately. Of the 57 partici-
pating countries, Liechtenstein had a sample size (n ¼ 339) too small to constitute a
sufficient number of subjects at each inquiry level so it was not included in the
analysis.
As previously described, student reports of inquiry teaching were categorized to five
levels, labeled as Level 0 to Level 4. For each country, we counted the percentage of
student–reported inquiry teaching at each level. Moreover, we calculated student
means of science achievement and attitudes weighted on MMS-W weight in each
inquiry level group for each country. The results for all analyzed countries are pre-
sented in Table 10. In the table, instead of reporting the mean values for each
inquiry teaching level, we summarize the results with the sequence of mean values
in an ascending order. For example, in the US data, student science achievement
from the lowest group to the highest group is Level 4, Level 0, Level 1, Level 3,
and Level 2, so we report the mean sequence of student science achievement in the
US as ‘40132.’ The reason for only presenting mean sequence instead of each individ-
ual mean score is that we are interested in the relative comparisons between the mean
scores rather than the actual mean scores of each group. More specifically, the focus of
the analysis was to examine the common patterns in the mean sequences and to what
568 F. Jiang and W.F. McComas

Table 8. Differences on categorical covariates among the treatment groups before and after
weighting

Before After
weighting weighting

Covariate DF x2 p x2 p

Student variables
Grade 8 78.76 ,.01 2.02 .98
Gender 4 23.10 ,.01 1.67 .80
Science learning time: regular lessons 20 109.22 ,.01 12.36 .90
Science learning time: out-of school-time lessons 20 111.20 ,.01 10.78 .95
Science learning time: self-study or homework 20 134.07 ,.01 6.48 .99
School variables
School community 20 81.00 ,.01 8.28 .99
School type of ownership 12 37.86 ,.01 8.38 .75
Students are grouped by ability into different classes 12 27.79 .01 4.94 .96
Students are grouped by ability within their classes 12 19.89 .23 8.90 .92
Fill all vacant 10th-grade science teaching positions 12 25.39 .06 8.17 .94
Lack of qualified science teachers 16 18.53 .29 8.88 .92
Lack of laboratory technicians 16 13.68 .32 3.94 .98
Lack of science laboratory equipment 16 12.85 .38 5.35 .95
Science activities: clubs 8 16.14 .04 3.12 .93
Science activities: fairs 8 31.46 ,.01 2.51 .96
Science activities: competitions 8 17.80 .02 3.94 .86
Science activities: projects 8 10.69 .22 3.23 .92
Science activities: trips 8 9.51 .30 3.92 .86
School concentrates science-related careers 12 16.21 .18 4.86 .96

extent the findings identified in the US data are consistent across all participating
countries.
For science achievement, Level 2 inquiry had the highest score with respect to the
US data. In all the 56 participating countries, this result held in 22 countries. In
addition, Level 2 inquiry had the second highest score in 18 countries. More specifi-
cally, Level 2 inquiry had a higher score than Level 0 in 37 countries, higher than
Level 1 in 39 countries, higher than Level 3 inquiry in 41 countries, and higher
than Level 4 inquiry in 47 countries. As for Level 4 inquiry, it had the lowest
science achievement score in the US data. This result appears in other 26 countries.
In addition, Level 4 inquiry had the second lowest science achievement in 14
countries.
For science attitudes, the US data demonstrated that student interest in science and
support for scientific inquiry increased when the inquiry teaching level increased. In
all the participating countries, this pattern can be found in 19 countries for student
interest in science and 16 countries for student support for scientific inquiry. After
allowing for switching of just one position in the mean sequence (e.g. ‘02134’), the
two numbers increased to 40 countries for student interest in science and 31 countries
for student support for scientific inquiry. As for Level 4 inquiry teaching, it had the
Table 9. Effects of the level of inquiry teaching on science achievement and attitudes

Level 0 Level 1 Level 2 Level 3 Level 4

Variable M SD M SD M SD M SD M SD F R2

Science achievement
Unweighted 489.85 99.98 500.65 92.93 553.11 89.23 514.32 92.55 474.79 98.38 15.56∗ 0.11
Weighted 506.80 102.49 502.93 92.43 516.84 94.24 506.72 96.04 497.57 100.16 12.69∗ 0.01
Interest in science
Unweighted 444.57 104.00 464.43 97.09 465.29 92.43 478.90 102.34 501.22 102.63 146.85∗ 0.05
Weighted 455.85 103.87 466.99 91.90 473.61 94.17 478.35 100.65 492.09 98.25 86.97∗ 0.02
Support for scientific inquiry
113.69∗

Effects of Inquiry Teaching 569


Unweighted 461.58 100.09 476.41 89.64 505.85 86.48 502.10 98.46 508.07 103.25 0.03
Weighted 479.11 99.32 478.31 88.51 494.20 86.26 499.05 98.88 510.39 102.36 50.78∗ 0.02

p , .01.
570 F. Jiang and W.F. McComas

Figure 2. The effect of the level of inquiry teaching on student science achievement after weighting

Figure 3. The effect of the level of inquiry teaching on student interest in science after weighting

highest student interest in science in 47 countries and the highest student support for
scientific inquiry in 42 countries.

Discussion
Level of Inquiry and Content Acquisition
The propensity score analysis reported here reveals that there were significant effects
of the level of openness in inquiry teaching on student science achievement and atti-
tudes toward science. In itself, this is likely not a surprising result, but those effects
Effects of Inquiry Teaching 571

Figure 4. The effect of the level of inquiry teaching on student support for scientific inquiry after
weighting

show different patterns on different outcome measures. For instance, with respect to
student science achievement, the highest score was achieved at Level 2 inquiry teach-
ing, where students frequently conduct activities and draw conclusions from data. For
student science attitudes (interest in science and student support for scientific
inquiry), higher outcome scores were achieved when the level of inquiry teaching
was increased. These conclusions generally held in most of the PISA 2006 participat-
ing countries thus providing a measure of reliability in the result.
Previous meta-analyses have demonstrated that inquiry teaching has inconsistent
effects on student science achievement. Some studies have shown positive effects,
and others have shown negative effects. The study reported here has shown that the
complexity of the impact of inquiry teaching on student science learning is related
to the level of openness in inquiry teaching. The effect of inquiry teaching is somewhat
complicated because our results show that increasing the level of openness in inquiry
teaching is not always beneficial to student science learning. Here, the students’
highest science achievement is obtained when they are involved in conducting activi-
ties and drawing conclusions from data only, rather than in higher level inquiry activi-
ties such as designing the investigation or raising their own questions. This pattern
was also identified in Minner et al.’s (2010) meta-analysis. In addition, our finding
adds to the evidence contrary to Kirschner et al.’s (2006) assertion that direct instruc-
tion is better than inquiry teaching, although we have already offered a cogent criti-
cism of the definition of inquiry used in their report.
Please note that the PISA science assessment items focus on students’ understand-
ing of science content, but are not designed to explicitly examine students’ epistemic
knowledge or understanding of the nature of science. Therefore, while it would be
interesting to gauge this domain directly, it is not possible to link the level of
inquiry to such aspects of student understanding through our analysis.
572 F. Jiang and W.F. McComas

Table 10. Percent of student reports at each inquiry teaching level and mean sequences of science
achievement and attitudes for all participating countries

Percent of student
reports Mean sequence

Interest Support
Science in for scientific
Country n L0 L1 L2 L3 L4 achievement science inquiry

Argentina 533 42 6 15 8 29 12304 14320 14302


Australia 7,382 27 11 33 11 18 02314 43210 43210
Austria 2,355 70 5 10 5 10 23041 43210 43210
Azerbaijan 1,202 23 5 5 6 61 02134 42301 40321
Belgium 4,016 61 7 21 4 6 32104 34210 32410
Bulgaria 1,963 54 6 7 5 28 23401 42301 42301
Brazil 536 48 5 6 7 35 30214 14023 41023
Canada 11,720 30 12 28 10 20 32104 43210 43210
Switzerland 6,283 42 8 27 8 15 32041 43201 34210
Chile 2,106 39 9 12 8 31 21304 43201 43021
Colombia 814 21 9 23 11 36 02431 41203 43120
Czech Republic 3,761 59 15 18 3 5 01234 43210 42130
Germany 2,317 39 7 31 8 14 21034 43201 42301
Denmark 2,808 17 19 31 21 12 24301 43210 43210
Spain 11,467 68 7 13 4 9 21304 43120 43210
Estonia 2,022 56 12 16 5 12 20143 43201 34201
Finland 2,912 43 13 36 4 4 12034 43210 43210
France 2,719 26 10 41 11 12 21340 43210 43210
UK 7,911 25 13 37 12 13 23410 43210 43210
Greece 2,093 54 6 11 6 24 20143 43120 43210
Hong Kong- 2,355 35 17 25 7 16 10423 43210 43201
China
Croatia 1,810 67 5 9 2 16 34210 43210 34120
Hungary 1,750 87 3 3 2 6 20413 32140 23140
Indonesia 5,927 33 9 12 9 37 21034 43102 43120
Ireland 2,565 23 18 42 8 9 42310 43201 34210
Iceland 1,914 81 6 5 4 5 10324 21430 12043
Israel 1,950 31 9 21 10 28 01234 43210 42301
Italy 13,003 65 9 13 3 10 01234 43210 43201
Jordan 2,820 21 8 9 7 55 01234 40312 41032
Japan 4,196 65 17 10 3 5 21304 43201 42310
Kyrgyzstan 2,231 13 11 8 7 61 20134 43120 43210
Korea 3,315 71 12 7 4 6 30142 34210 43210
Lithuania 2,314 34 13 36 6 11 32410 43210 43201
Luxembourg 2,102 59 6 17 6 12 21034 43210 34210
Latvia 2,230 47 15 23 4 12 13024 34210 34210
Macao-China 953 54 19 15 4 9 02413 42310 41203
Mexico 13,664 25 8 17 10 40 12430 43201 43210
Montenegro 1,581 62 7 6 3 22 31024 43210 34102
Netherlands 2,482 49 10 23 8 9 42103 42310 42310

(Continued)
Effects of Inquiry Teaching 573

Table 10. Continued

Percent of student
reports Mean sequence

Interest Support
Science in for scientific
Country n L0 L1 L2 L3 L4 achievement science inquiry

Norway 3,090 47 12 23 8 10 20314 43120 21340


New Zealand 2,844 33 14 31 9 13 23140 43210 43210
Poland 2,087 63 5 15 3 14 32041 34210 34210
Portugal 1,956 44 12 18 8 19 14023 42310 42130
Qatar 2,825 27 5 8 7 53 23041 43201 42031
Romania 2,057 33 12 12 9 34 21034 41023 41023
Russian 3,670 22 13 21 13 31 02314 43210 43210
Federation
Serbia 1,780 71 6 6 3 14 04123 41230 42130
Slovak Republic 2,810 53 17 16 5 9 31204 41302 43120
Slovenia 3,681 52 19 9 6 15 32041 34210 43210
Sweden 1,708 31 10 35 12 12 23041 43210 24130
Chinese Taipei 5,438 52 21 12 3 13 12043 42103 42103
Thailand 3,237 23 7 10 8 52 34201 43120 43210
Tunisia 1,629 15 7 16 10 52 02143 42301 42103
Turkey 1,900 47 4 4 6 38 10432 43021 43012
Uruguay 1,489 37 8 24 9 21 20143 43102 43120
USA 3,410 19 9 26 15 30 20314 43210 43201

Note: L# ¼ Level # inquiry teaching.

Level of Inquiry on Students’ Attitudes and Content Acquisition


This study has also provided some interesting conclusions by demonstrating how
student science attitudes were impacted by inquiry teaching. This has rarely been
studied in previous research. We found contradictory effects regarding the level of
inquiry teaching on knowledge acquisition and the attitudinal outcomes of science
learning as the level of openness of inquiry teaching increased. Consider the
dilemma that teachers would face when recognizing that the highest levels of
inquiry teaching can significantly and positively impact student attitudes toward
science but moderate levels of inquiry teaching are more efficient in terms of
helping students understand science content. This result may be explained by recog-
nizing that because higher levels of inquiry teaching require more instruction time
than lower level inquiry teaching.
Perhaps, the most exciting outcome of this study is that there is a clear demon-
stration that not all inquiry teaching is the same. Inquiry teaching is a highly
complex pedagogical act and most educators certainly recognize that not all versions
of it are equally applicable in all instructional situations for all students with all con-
tents. The most capable science teachers understand that there is a time and place for
high-quality didactic instruction just as there are reasons to apply low level and higher
574 F. Jiang and W.F. McComas

levels of inquiry. The key is to know how and when to apply any instructional
modality. The results of this study seem clear that it is wise to spend most of classroom
instructional time at Level 2 of inquiry teaching (students conduct hands-on activities
and draw conclusion from data). This will ensure that students learn science content
and gain insights about science.
Perhaps, the most open-ended inquiry activities in which students select their own
research topics and design their own investigations should be used sparingly as after-
class projects or in science fair situations while devoting classroom instruction time on
projects at low levels of inquiry. Please note that this study does not suggest that
inquiry has no place in science instruction, as Kirschner et al. (2006) might suggest
with their unsophisticated view of this important teaching technique. Rather, the
proper level of inquiry should be applied with knowledge of the strengths and limit-
ations for each level.
Another useful outcome of this study is to have demonstrated the use of propensity
score analysis for multi-valued treatment, a fairly new technique, in educational
research. Based on the propensity score analysis, the selection bias was controlled
for observed covariates and more valid causal inference was established to estimate
the effects of inquiry teaching on different components of student science achieve-
ment and attitudes. However, our current analysis should be seen as preliminary
and we are humble to make causal claims. To establish more credible causal inference,
the next stage of analysis will involve sensitive analysis on unobserved covariates.
Finally, we conclude with a word about a central limitation inherent in the current
analysis. This limitation links to the lack of the evidence of the validity of student
report on the use of teaching strategies. We fully recognize that we are taking a face
value, the students’ self-reported data. How students interpreted the questions as
they responded to them is a major source of potential error in this analysis.
Without careful examination, it is questionable whether such student reports accu-
rately represent the reality of the teaching practices they witnessed. In addition, con-
straints of the PISA data made it impossible to examine the teacher effect so we are
forced to report that we can make no statement about whether the effects of
inquiry teaching are due to the differences among the teachers. Nevertheless, the find-
ings of this study suggest that propensity score analysis is a powerful new tool to apply
in educational research settings and that inquiry is as rich and complex a pedagogical
tool as the best science teachers have known for generations.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This research was supported by a grant from the American Educational Research
Association which receives funds for its ‘AERA Grants Program’ from the National
Effects of Inquiry Teaching 575

Science Foundation under NSF Grant [#DRL-0941014]. Opinions reflect those of


the author(s) and do not necessarily reflect those of the granting agencies.

References
Anderson, R. D. (2007). Inquiry as an organizing theme for science curricula. In S. K. Abell & N. G.
Lederman (Eds.), Handbook of research on science education (pp. 807–830). Mahwah, NJ: Lawr-
ence Erlbaum Associates.
Bredderman, T. (1983). Effects of activity-based elementary science on student outcomes: A quan-
titative synthesis. Review of Educational Research, 53(4), 499 –518.
Caliendo, M., & Kopeinig, S. (2008). Some practical guidance for the implementation of propensity
score matching. Journal of Economic Surveys, 22(1), 31–72.
Duschl, R. A. (2003). Assessment of inquiry. In J. M. Atkin & J. Coffey (Eds.), Everyday assessment
in the science classroom (pp. 41–59). Arlington, VA: NSTA Press.
Fan, X., & Nowell, D. L. (2011). Using propensity score matching in educational research. Gifted
Child Quarterly, 55(1), 74–79.
Frölich, M. (2004). Programme evaluation with multiple treatments. Journal of Economic Surveys,
18(2), 181–224.
Furtak, E. M., Seidel, T., Iverson, H., & Briggs, D. (2009). Recent experimental studies of inquiry-
based teaching: A meta-analysis and review. Presented at the 13th Biennial Conference of the
European Association for Research in Learning and Instruction, Amsterdam, Netherlands.
Herron, M. D. (1971). The nature of scientific enquiry. The School Review, 79(2), 171–212.
Hmelo-Silver, C. E., Duncan, R. G., & Chinn, C. A. (2007). Scaffolding and achievement in
problem-based and inquiry learning: A response to Kirschner, Sweller, and Clark (2006). Edu-
cational Psychologist, 42(2), 99– 107.
Hong, G. (2012). Marginal mean weighting through stratification: A generalized method for eval-
uating multivalued and multiple treatments with nonexperimental data. Psychological Methods,
17(1), 44–60.
Imbens, G. W. (2000). The role of the propensity score in estimating dose-response functions. Bio-
metrika, 87(3), 706– 710.
Jiang, F., & McComas, W. F. (2012). The effects of level of openness in inquiry teaching on student science
achievement and attitudes: Evidence from propensity score analysis with PISA 2006 U.S. data. Paper
presented at the Annual Conference of National Association of Research in Science Teaching,
Indianapolis, IN.
Kirschner, P. A., Sweller, J., & Clark, R. E. (2006). Why minimal guidance during instruction does
not work: An analysis of the failure of constructivist, discovery, problem-based, experiential,
and inquiry-based teaching. Educational Psychologist, 41(2), 75–86.
Lechner, M. (2001). Identification and estimation of causal effects of multiple treatments under the
conditional independence assumption. In M. Lechner & F. Pfeiffer (Eds.), Econometric evalu-
ation of labour market policies (Vol. 13, pp. 43–58). Heidelberg: Springer.
Lott, G. W. (1983). The effect of inquiry teaching and advance organizers upon student outcomes in
science education. Journal of Research in Science Teaching, 20(5), 437 –451.
Minner, D. D., Levy, A. J., & Century, J. (2010). Inquiry-based science instruction—what is it and
does it matter? Results from a research synthesis years 1984 to 2002. Journal of Research in
Science Teaching, 47(4), 474 –496.
National Research Council. (1996). National science education standards. Washington, D.C.: The
National Academies Press.
Organisation for Economic Co-operation and Development. (2006). Assessing scientific, reading and
mathematical literacy: A framework for PISA 2006. Paris: OECD Publishing.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational
studies for causal effects. Biometrika, 70(1), 41–55.
576 F. Jiang and W.F. McComas

Schroeder, C. M., Scott, T. P., Tolson, H., Huang, T.-Y., & Lee, Y. (2007). A meta-analysis of
national research: Effects of teaching strategies on student achievement in science in the
United States. Journal of Research in Science Teaching, 44(10), 1436– 1460.
Schwab, J. J. (1962). The teaching of science as enquiry. In J. J. Schwab & P. F. Brandwein (Eds.),
The teaching of science (pp. 1–103). Cambridge, MA: Harvard University Press.
Shulman, L. S., & Tamir, P. (1973). Research on teaching in the natural sciences. In R. M. W.
Travers (Ed.), Second handbook of research on teaching: A project of the American educational
research association (pp. 1098–1148). Chicago, IL: Rand McNally.
Shymansky, J. A., Hedges, L. V., & Woodworth, G. (1990). A reassessment of the effects of inquiry-
based science curricula of the 60’s on student performance. Journal of Research in Science Teach-
ing, 27(2), 127 –144.
Shymansky, J. A., Kyle Jr. W. C., & Alport, J. M. (1983). The effects of new science curricula on
student performance. Journal of Research in Science Teaching, 20(5), 387– 404.
Spreeuwenberg, M. D., Bartak, A., Croon, M. A., Hagenaars, J. A., Busschbach, J. J. V., Andrea, H.,
. . . Twisk, J. (2010). The multiple propensity score as control for bias in the comparison of more
than two treatment arms: An introduction from a case study in mental health. Medical Care, 48,
166– 174.
Stuart, E. A., & Rubin, D. B. (2008). Matching methods for causal inference: Designing observa-
tional studies. In J. Osborne (Ed.), Best practices in quantitative methods (pp. 155– 176). London:
Sage.
Treagust, D. F. (2007). General instructional methods and strategies. In S. K. Abell & N. G. Leder-
man (Eds.), Handbook of research on science education (pp. 373 –391). Mahwah, NJ: Lawrence
Erlbaum Associates.
Weinstein, T., Boulanger, F. D., & Walberg, H. J. (1982). Science curriculum effects in high school:
A quantitative synthesis. Journal of Research in Science Teaching, 19(6), 511 –522.
Windschitl, M. (2008). What is inquiry? A framework for thinking about authentic scientific prac-
tice in the classroom. In J. Luft, R. L. Bell, & J. Gess-Newsome (Eds.), Science as inquiry in the
secondary setting (pp. 1–20). Arlington, VA: NSTA Press.
Wise, K. C., & Okey, J. R. (1983). A meta-analysis of the effects of various science teaching strat-
egies on achievement. Journal of Research in Science Teaching, 20(5), 419– 435.

You might also like