Professional Documents
Culture Documents
To cite this article: Feng Jiang & William F. McComas (2015) The Effects of Inquiry
Teaching on Student Science Achievement and Attitudes: Evidence from Propensity Score
Analysis of PISA Data, International Journal of Science Education, 37:3, 554-576, DOI:
10.1080/09500693.2014.1000426
Gauging the effectiveness of specific teaching strategies remains a major topic of interest in science
education. Inquiry teaching among others has been supported by extensive research and
recommended by the National Science Education Standards. However, most of the empirical
evidence in support was collected in research settings rather than in normal school environments.
Propensity score analysis was performed within the marginal mean weighting through
stratification (MMW-S) approach to examine the effects of the level of openness of inquiry
teaching on student science achievement and attitudes with the Programme for International
Student Assessment (PISA) 2006 data. Weighting subjects on MMW-S weight successfully
balanced all treatment groups on all selected covariates. Significant effects were identified on
both cognitive and attitudinal outcomes. For student science achievement, the highest score was
achieved at Level 2 inquiry teaching, that is, students conduct activities and draw conclusions
from data. For student science attitudes, higher level of inquiry teaching resulted in higher
scores. The said conclusions were generally held in most PISA 2006 participating countries when
the analysis was performed in each country separately.
Introduction
Gauging the effectiveness of applying any specific teaching strategy to meet a particu-
lar learning goal remains a major topic of interest in science education. For instance,
∗
Corresponding author. Office of Innovation for Education, University of Arkansas, 309 WAAX,
Fayetteville, AR 72701, USA. Email: fjiang@uark.edu
given the continuing focus on the use of inquiry teaching, measuring its impact on
important educational outcomes would be an important goal. Inquiry teaching has
been supported by extensive research, is widely endorsed by the science education
community (e.g. Anderson, 2007; Treagust, 2007; Windschitl, 2008), and is rec-
ommended by the National Science Education Standards (National Research
Council, 1996). However, the rationale of inquiry teaching, particularly open
inquiry, has also been questioned by some educational psychologists (Kirschner,
Sweller, & Clark, 2006). Although the effectiveness of inquiry teaching has been sup-
ported by empirical studies (Minner, Levy, & Century, 2010), most of the evidence
was collected in research settings rather than in school environments. Jiang and
McComas’s (2012) study on the Programme for International Student Assessment
(PISA) data revealed an unexpected negative correlation between the use of student
investigations and student science achievement thus providing the impetus for the
present study.
The term inquiry teaching has been used to describe many dramatically different
teaching practices all which involve student decision-making to one degree or
another. However, it is wise not to assume all of the versions of inquiry would
make the same impact on student science learning and related attitudes. An important
distinction in inquiry teaching is between guided inquiry and more ‘open’ varieties.
While the distinctions between types of inquiry are well known, the impacts of
these distinct forms have not been carefully examined, particularly in real school set-
tings. Therefore, the purpose of the present study is to examine the effect of different
levels of inquiry teaching on student science achievement and attitudes with the PISA
data, through the lens of causal inference provided by the application of propensity
score analysis.
Review of Literature
Empirical Evidence Supporting Inquiry Teaching
Inquiry in science teaching plays at least two different roles. First, inquiry as a practice
engaged in by scientists should be taught as part of the science curriculum. Second,
and perhaps more importantly, inquiry should be used as a pedagogical tool
through which students can learn science content and practice through experiencing
the process of inquiry itself. The National Science Education Standards (National
Research Council, 1996) describes inquiry as a set of science practices:
Inquiry is a multifaceted activity that involves making observations; posing questions;
examining books and other sources of information to see what is already known; planning
investigations; reviewing what is already known in light of experimental evidence; using
tools to gather, analyze, and interpret data; proposing answers, explanations, and predic-
tions; and communicating the results. Inquiry requires identification of assumptions, use
of critical and logical thinking, and consideration of alternative explanations. (p. 23)
Formally, inquiry teaching was often connected to the concept of ‘discovery teaching.’
This was the case in Wise and Okey’s (1983) meta-analysis of science teaching
556 F. Jiang and W.F. McComas
strategies. Wise and Okey simply call inquiry teaching a ‘more student-centered and
less step-by step teacher directed learning experience’ (p. 421). Such an orientation is
reflective of the philosophy of constructivist educational theory, so it was not a sur-
prise to see Kirschner et al. (2006) consider inquiry teaching, discovery, and con-
structivism as essentially the same. Unfortunately, the label ‘minimally guided
instructional approach’, they stamped on such teaching strategies, is problematic
(Hmelo-Silver, Duncan, & Chinn, 2007). Inquiry teaching is a broad suite of strat-
egies, but Kirschner et al. (2006) made the broad claim that there is ‘overwhelming
and unambiguous evidence’ that ‘minimally guided instruction’, which includes
inquiry teaching, ‘is significantly less effective and efficient than guidance specifically
designed to support the cognitive processing necessary for learning’ (p. 76). Evidence
in favor of inquiry from meta-analyses on inquiry teaching does not support their
assertion, as we will see in this section.
Before reviewing the evidence, it is necessary to provide some clarification about
why some frequently cited meta-analyses regarding inquiry are not included in the
following review. First, although four meta-analyses (Bredderman, 1983; Shymansky,
Hedges, & Woodworth, 1990; Shymansky, Kyle Jr., & Alport, 1983; Weinstein,
Boulanger, & Walberg, 1982) comparing the so-called innovative curricula to tra-
ditional curricula have been repeatedly cited for supporting inquiry teaching, they
lack a clear definition of inquiry teaching or inquiry-based curriculum. Moreover,
even if these curricula were based on inquiry principles, there is no guarantee that
they would be used effectively in classrooms. Further, schools that adopt those curricula
could be systematically different from those using traditional curricula, while this selec-
tion bias was not handled in those meta-analytic studies. Second, although Lott (1983)
seemed to address inquiry teaching given the title of his meta-analysis, he considered
inquiry teaching the same as inductive teaching, a popular inaccurate perspective.
This is related to a common misconception about the nature of science, which mista-
kenly simplifies scientific inquiry as inductive reasoning. Finally, Wise and Okey
(1983) provided a meta-analysis for a variety of teaching strategies including inquiry,
but their analysis did not include the rigidness of the analyzed studies or detailed
information on inquiry teaching.
Fortunately, three more robust meta-analyses were published recently that are more
rigorous from both conceptual and methodological perspectives. These are con-
sidered as appropriate evidence for evaluating the impact of inquiry teaching.
Schroeder, Scott, Tolson, Huang, and Lee (2007) conducted a meta-analysis of US
research published from 1980 to 2004 with respect to the effect of eight science teach-
ing strategies, including inquiry teaching, on student achievement. Sixty-one studies
were analyzed and included based on the criteria that: they were carried out in the
USA, were experimental or quasi-experimental, and included effect size or the stat-
istics necessary to calculate effect size. Inquiry strategies were defined as those in
which ‘teachers use student-centered instruction that is less step-by-step and
teacher-directed than traditional instruction; students answer scientific research ques-
tions by analyzing data (e.g., using guided or facilitated inquiry activities, laboratory
inquiries)’ (p. 1446). In the sixty-one analyzed studies, 12 were focused on inquiry
Effects of Inquiry Teaching 557
strategies. The effect size of the inquiry strategies was 0.65. However, no more
detailed information is provided specifically for inquiry teaching, because the study
analyzed several teaching strategies.
Furtak, Seidel, Iverson, and Briggs’s (2009) meta-analysis focused on experimental
and quasi-experimental studies of K-12 inquiry-based classroom science teaching
published between 1996 and 2006. Based on the Duschl’s (2003) conceptual frame-
work of inquiry teaching, the authors defined a four-faceted model for inquiry which
includes conceptual, procedural, epistemic, and social facets. Nine studies from six
reports (with three replications) were included in their analysis. In the nine studies,
effect sizes ranged from 20.27 to 2.95, with two negatives and seven positives. The
researchers also performed specific analyses on relationships between effect size
with other factors, such as the four facets, grade level, duration of treatment, and
teacher-led with student-led conditions. Based on their findings, the researchers high-
lighted two factors that could increase positive effect size, including the emphasis of
the epistemic facets on inquiry and a longer exposure of students to inquiry
instruction.
Minner et al.’s (2010) Inquiry Synthesis Project synthesized findings from research
conducted between 1984 and 2002 to determine the impact of inquiry science
instruction on K-12 student outcomes. The research team first developed a concep-
tual framework for inquiry science instruction, consisting of three aspects: (1) the
presence of science content, (2) student engagement with science content, and
(3) components of instruction. The components of instruction were defined in a
two-dimensional table of specifications. The first dimension contains three instruc-
tional domains: student responsibility for learning, student active thinking, and
student motivation. The second dimension contains five inquiry components: ques-
tion, design, data, conclusion, and communication. In the syntheses, 138 studies
were included, with 73 non-experimental studies, 35 quasi-experimental studies,
and 30 experimental studies. Within the analyzed studies, 51% showed positive
impacts of inquiry teaching on student learning. The researchers concluded that
the evidence of positive effects of inquiry-based instruction was not conclusive, and
the association between the level of inquiry saturation and student learning
outcome was also modest. Concurrently, they argued that there was a clear and posi-
tive trend favoring inquiry-based instructional practices when instruction emphasized
student active thinking and drawing conclusions from data. The researchers also rated
studies based on their methodological rigor, but they found methodological rigor was
not significantly related to the effect of inquiry teaching on student outcomes.
Based on these three meta-analyses, it is reasonable to conclude that the effect of
inquiry teaching is no worse and perhaps somewhat better than conventional
science teaching. Nevertheless, positive research evidence does not always guarantee
successful practice. Although controlled random trial is considered as the ‘gold stan-
dard’ in methodology, the teaching strategy implemented in regular classrooms is not
necessarily the same as implemented in instructional experiments. In a preliminary
study, Jiang and McComas (2012) identified a significant negative correlation
between the use of student investigations and student science achievement. It is
558 F. Jiang and W.F. McComas
Methodology
The purpose of this study was to examine the effects of levels of openness in inquiry
teaching on student science achievement and their attitudes toward science. Because
the study was conducted on non-experimental assessment data, to establish a valid
causal inference, the propensity score analysis was performed within the marginal
mean weighting through stratification (MMW-S) approach (Hong, 2012). The
specific research question answered in this study is: what are the effects of the level
of openness in inquiry teaching on student science achievement and attitudes.
Measurement
This study involved the use of the PISA data from 2006. PISA is a worldwide evalu-
ation of 15-year-old school students’ scholastic performance in three domains:
reading, mathematics, and science. The assessment was first administrated in 2000
and repeated every three years. In each PISA administration, all three domains
(reading, math, and science) are assessed, and one domain is emphasized through
the inclusion of an extensive set of questions. In 2006, the PISA assessment empha-
sized students’ science literacy with the inclusion of extensive survey items. Therefore,
this enabled analysis not possible with the more recent PISA 2009 and 2012 that did
not have the science focus. This study involved the use of three groups of variables:
(1) outcome variables regarding the measurement of students’ scientific achievement
and attitudes, (2) treatment variables regarding inquiry teaching activities, and
(3) covariates such as demographic information, social economic status, school
characteristics, and student scholastic aptitudes.
Science Learning Outcomes in PISA. Science learning outcomes were assessed for
both cognitive and affective elements (Organisation for Economic Co-operation
and Development, 2006). Science competency is the central construct in the PISA
assessment and was defined in three subscales: identifying scientific issues, explaining
phenomena scientifically, and using scientific evidence. However, we did not conduct
analysis at the subscale level because these subscales are so strongly correlated with
coefficients greater than 0.9 that analyses of different subscales provide almost the
same results. The attitudinal science learning outcomes include three components:
interest in science, support for scientific inquiry, and responsibility toward resources
and environments. The last attitude measure—responsibility towards resources and
environments—was not included in the analysis because it was not a focus of the
study. Both the cognitive and attitudinal outcomes were analyzed. Therefore, this
study had three outcome measures including science achievement (overall science
competency), interest in science, and support for scientific inquiry.
Effects of Inquiry Teaching 559
Level 0 T T T T
Level 1 S T T T
Level 2 S S T T
Level 3 S S S T
Level 4 S S S S
frequently (never or hardly ever, in some lessons, in most lessons, in all lessons)
specific activities were used in science teaching. In the 17 items under this question,
seven of them were directly related to inquiry teaching.
In the PISA data set, the frequencies of using the teaching strategies were calibrated
with Rasch models and a score was generated for each teaching strategy. However, it is
difficult to interpret such scale scores because each scale includes several items and
each item addresses different levels of inquiry teaching, such as students make own
conclusion, design own investigations, or choose own topics. It is hard to tell
whether a higher scale score indicates the use of higher level of inquiry teaching or
a higher frequency in using low-level inquiry. Therefore, the scaled scores provided
by PISA were not used in the study. Rather, the levels of inquiry teaching were
defined based on our conceptual framework.
The seven items related to inquiry teaching and the corresponding targeted inquiry
components are shown in Table 2. In the seven items, item 1 and 2 target on conduct-
ing activities; item 3 targets on drawing conclusions; item 4 and 5 target on designing
investigations and item 6 and 7 target on asking questions. For each item, the fre-
quency values are coded as: 1 ¼ never or hardly ever, 2 ¼ in some lessons, 3 ¼ in
most lessons, and 4 ¼ in all lessons. For the targeted inquiry components which
contain two questionnaire items, the frequency value is obtained from the average
of the two. For example, the frequency value of conducting activities is the average
of items 1 and 2.
Based on the frequencies of the four inquiry components, we defined five levels of
inquiry teaching (see Table 3). The third level of frequency in the Likert scale, ‘in
most lessons’, is selected as the cutoff indicating sufficient implementation of each
inquiry component. At Level 0, none of the inquiry components is sufficiently
implemented; at Level 1, only conducting activities is sufficiently implemented, but
not drawing conclusions, designing investigations, or asking questions; at Level 2,
Table 2. Selected inquiry teaching survey items in PISA 2006 for defining the levels of inquiry
teaching
Table 3. Five levels of inquiry teaching with the reported frequency values
Level 0 ≤2 ≤2 ≤2 ≤2
Level 1 .2 ≤2 ≤2 ≤2
Level 2 .2 .2 ≤2 ≤2
Level 3 .2 .2 .2 ≤2
Level 4 .2 .2 .2 .2
Note: frequency: 1, never or hardly ever; 2, in some lessons; 3, in most lessons and 4, in all lessons.
only conducting activities and drawing conclusions are sufficiently implemented, but
not designing investigations or asking questions; at Level 3, conducting activities,
drawing conclusions, and designing investigations are sufficiently implemented, but
not asking questions and at Level 4, all the components are sufficiently implemented.
We expected that some of the student reports would not fit with the defined pattern.
For example, some students may report high frequency use of designing investigations
but low frequency of conducting activities. The mismatch between student reports
and the proposed pattern may have resulted from two reasons. First, some teachers
may not use inquiry teaching promptly; second, some students may not report teach-
ing activities correctly. No matter the reason, those cases should be excluded from the
analysis of the effect of different levels of inquiry teaching on learning outcomes.
Analysis
This study estimated the causal effect of the reported level of inquiry teaching on
student science achievement and their attitudes toward science. The levels of
inquiry teaching were considered as five categories of treatments in the causal analysis.
Previous analysis has found a negative correlation between student investigations and
science achievement (Jiang & McComas, 2012). In this case, the correlation does not
necessarily indicate causation because the treatments were not randomly assigned. As
an enhancement to the previous study to reduce selection bias and examine the causal
effect of inquiry teaching on science achievement and attitudes, the present study
involved the use of a propensity score analysis following the MMW-S approach
(Hong, 2012).
PISA 2006 is an international assessment administrated in 57 countries (or
regions), and the practice and context of science teaching and learning are likely to
be different among the participating countries. Therefore, our analyses were con-
ducted separately for the data from each country. In other words, all the analyses
treated each country as a standalone data set, and followed the same procedures of
propensity score analysis. However, by conducting the analyses in each country sep-
arately, we examined whether the impact of inquiry teaching on science achievement
and attitudes is consistent across participating countries.
562 F. Jiang and W.F. McComas
Propensity score analysis. Propensity score (Rosenbaum & Rubin, 1983) is the esti-
mated probability that a subject would be assigned to a particular treatment con-
dition (e.g. a particular level of inquiry teaching), given the subject’s
characteristics with respect to the covariates. By matching or weighting on pro-
pensity scores, selection bias can be significantly reduced and valid estimation
of causal effect can be established. Combined suggestions appeared in the
recent literature (Caliendo & Kopeinig, 2008; Frölich, 2004; Spreeuwenberg
et al., 2010; Stuart & Rubin, 2008), the causal analysis consisted of following
major steps:
(1) Select the covariates. Choosing a large set of covariates is beneficial in reducing
selection bias (Stuart & Rubin, 2008). The covariates were selected based on
theory and the results of previous empirical studies, rather than depending on
the correlations in the current data set (Caliendo & Kopeinig, 2008). The pool
of covariates included variables regarding students’ demographic and socioeco-
nomic status, the cognitive characteristics of students, the attitudinal character-
istics of students, and school characteristics (Fan & Nowell, 2011). In this
study, 33 variables (see Tables 7 and 8) were included in the multinomial logistic
regression model to estimate the probability of a student being assigned to a
specific inquiry level. As for missing data in categorical covariates, missing
values were replaced with indicators. Missing data in continuous covariates
were imputed through maximum likelihood estimation.
(2) Estimate the propensity scores. Because the treatment is a five-level categorical
variable, the generalized propensity scores were estimated with multinomial logis-
tic regression (Imbens, 2000; Lechner, 2001). One set of propensity scores was
generated for each treatment level.
(3) Identify the common support. The region of overlap on propensity scores across
different treatment groups was identified and subjects outside the region were
dropped. Overlapping histograms were used to examine the sufficiency of over-
lapping through the visualized approach.
(4) Stratify the subjects. The subclassification was performed for each set of propen-
sity scores separately by splitting the sample into strata based on the quantiles of
propensity scores. Although the numbers of strata are not necessarily the same
across different sets of propensity scores (Hong, 2012), we found that splitting
to six strata for all the five sets of propensity scores simplified the process and suf-
ficiently balanced the treatment groups.
(5) Calculate the marginal mean weights. The marginal mean weight for each
stratum was calculated after subclassification. The weighting values were used
as sample weight in the following analyses. For subjects assigned to stratum
ts in treatment group t, the MMW-S weight is (Hong, 2012) calculated as
follows:
Nt,s /Ns
Wt,s =
Nt /N
Effects of Inquiry Teaching 563
(6) Nt,s is the number of subjects in treatment group t in stratum s, Ns is the number
of subjects in stratum s, and Nt is the number of subjects assigned to treatment t.
To incorporate the complex sampling design of the PISA data, the final weight
was obtained by multiplying the sample weight with the MMW-S weight.
(7) Diagnose the balance. To establish reliable causal inference, it is necessary to
assess the balance among the treatment groups. The balance diagnostics involved
hypothesis testing on each covariate. Both weighted and unweighted analyses
were conducted to compare the balance before and after correction.
(8) Estimate the treatment effect. After a balanced sample was obtained, one-way
ANOVA analyses were conducted to determine the impact of inquiry teaching
on student science achievement and attitudes. The independent variable was
the treatment group, while the dependent variables were student science achieve-
ment, interest in science, and support for scientific inquiry. This study differs
from conventional analysis because selection bias was substantially reduced by
weighting subjects on MMW-S weight.
PISA statisticians used Item Response Theory (IRT) to estimate students’ math,
science, and reading scores. These IRT estimations provided five plausible values
(PVs) for each individual score. Therefore, in the PISA data set each student has
five PVs for math achievement score, five PVs for science, and five PVs for
reading. In this study, math and reading scores were used as covariates, while
science score were used as an outcome measure. Propensity score analyses were
run five times, and every time a set of the five PVs (denoted PV1 – PV5) was used.
For instance, the first run used PV1 math, PV1 reading, and PV1 science; and
then the second run used PV2 math, PV2 reading, and PV2 science, and so on.
After the five runs, all results from each run were averaged to obtain the final
results, as well as standard deviations were adjusted with the pooled standard
deviations.
Results
Propensity score analyses were conducted separately for each country. In the follow-
ing sections, we first present detailed analysis for the US data, and then present sum-
marized results for analyses from all participating countries.
were kept for further analysis, while the other 329 students, roughly 9%, were
excluded for lack of common support. Those 3,410 students constitute the analytic
sample of the causal analysis in the study.
While it might be surprising, approximately 30% of the students reported the
highest level of openness in inquiry teaching, which means around 30% US stu-
dents, mostly 10th graders, have a good chance to select their own investigation
topics in science lessons. On the other hand, around 19% of the students reported
the lowest level of openness in inquiry teaching, which means they sit and watch
their teachers during most science lessons. The percentages of students receiving
other levels of inquiry teaching are 9% in Level 1 (i.e. students conduct hands-on
activities), 26% in Level 2 (i.e. students conclude data), and 15% in Level 3 (stu-
dents design investigation). As shown in Table 4, the adjustment for common
support only has a marginal impact on the percent of subjects in each treatment
group.
Based on the estimated logit propensity scores, six equal strata were created for each
of the five sets of logit propensity scores. Table 5 displays, for each treatment group,
the number of students in each stratum and the corresponding MMW-S weight of
student in that stratum.
Table 6. Between-treatment-group differences in logit propensity score before and after weighting
Estimation of causal effects. ANOVA analyses were used to examine the effects of the
level of inquiry teaching on student science achievement and attitudes (Table 9). It
was shown that the level of inquiry teaching has a statistically significant impact on
the three outcome measures (student science achievement, interest in science, and
support for scientific inquiry).
What is more interesting than the omnibus test of the significance of the causal
effects is the pattern of change between the groups. Student science achievement
reached the highest point with Level 2 inquiry teaching (see Figure 2), which involves
students in frequent conducting activities and drawing conclusions but not designing
investigations or asking questions. This result is in accordance with the findings from
Minner et al.’s (2010) meta-analysis which favored the concluding component of
inquiry teaching. It is particularly interesting that the highest level of openness of
inquiry teaching, which involves students in frequently asking questions and other
inquiry teaching components, results in the lowest science achievement in students.
The difference between the highest achievement group and the lowest achievement
group is about 11% of the standard deviation of student science achievement.
On the other hand, increasing the level of inquiry teaching showed a consistently
positive effect on the two attitudinal measures, student interest in science, and
support for scientific inquiry (Figures 3 and 4). The differences between the
highest group and the lowest group are about 39% and 35% of a standard deviation
of the outcome, respectively.
Effects of Inquiry Teaching 567
Table 7. Differences on continuous covariates among the treatment groups before and after
weighting
Before After
weighting weighting
Covariate F p F p
Student variables
ESCS (index of economic, social and cultural status) 16.12 ,.01 0.30 .88
Home educational resources 20.50 ,.01 1.12 .35
Family wealth 13.12 ,.01 0.41 .80
Science courses taken in the last year 82.25 ,.01 0.40 .81
Science courses taken in this year 18.28 ,.01 0.25 .91
Math achievement 15.90 ,.01 0.88 .48
School variables
Teacher –student ratio 2.61 .03 0.40 .81
Size of English classes 3.81 ,.01 0.56 .70
Proportion of certified teachers 2.08 .08 0.07 .99
School size 3.05 .02 0.57 .68
Percent of students receiving free/reduced lunch 0.41 .80 0.91 .46
Quality of educational resources 1.73 .14 0.11 .98
Activities to promote the learning of science 0.13 .97 0.74 .56
Table 8. Differences on categorical covariates among the treatment groups before and after
weighting
Before After
weighting weighting
Covariate DF x2 p x2 p
Student variables
Grade 8 78.76 ,.01 2.02 .98
Gender 4 23.10 ,.01 1.67 .80
Science learning time: regular lessons 20 109.22 ,.01 12.36 .90
Science learning time: out-of school-time lessons 20 111.20 ,.01 10.78 .95
Science learning time: self-study or homework 20 134.07 ,.01 6.48 .99
School variables
School community 20 81.00 ,.01 8.28 .99
School type of ownership 12 37.86 ,.01 8.38 .75
Students are grouped by ability into different classes 12 27.79 .01 4.94 .96
Students are grouped by ability within their classes 12 19.89 .23 8.90 .92
Fill all vacant 10th-grade science teaching positions 12 25.39 .06 8.17 .94
Lack of qualified science teachers 16 18.53 .29 8.88 .92
Lack of laboratory technicians 16 13.68 .32 3.94 .98
Lack of science laboratory equipment 16 12.85 .38 5.35 .95
Science activities: clubs 8 16.14 .04 3.12 .93
Science activities: fairs 8 31.46 ,.01 2.51 .96
Science activities: competitions 8 17.80 .02 3.94 .86
Science activities: projects 8 10.69 .22 3.23 .92
Science activities: trips 8 9.51 .30 3.92 .86
School concentrates science-related careers 12 16.21 .18 4.86 .96
extent the findings identified in the US data are consistent across all participating
countries.
For science achievement, Level 2 inquiry had the highest score with respect to the
US data. In all the 56 participating countries, this result held in 22 countries. In
addition, Level 2 inquiry had the second highest score in 18 countries. More specifi-
cally, Level 2 inquiry had a higher score than Level 0 in 37 countries, higher than
Level 1 in 39 countries, higher than Level 3 inquiry in 41 countries, and higher
than Level 4 inquiry in 47 countries. As for Level 4 inquiry, it had the lowest
science achievement score in the US data. This result appears in other 26 countries.
In addition, Level 4 inquiry had the second lowest science achievement in 14
countries.
For science attitudes, the US data demonstrated that student interest in science and
support for scientific inquiry increased when the inquiry teaching level increased. In
all the participating countries, this pattern can be found in 19 countries for student
interest in science and 16 countries for student support for scientific inquiry. After
allowing for switching of just one position in the mean sequence (e.g. ‘02134’), the
two numbers increased to 40 countries for student interest in science and 31 countries
for student support for scientific inquiry. As for Level 4 inquiry teaching, it had the
Table 9. Effects of the level of inquiry teaching on science achievement and attitudes
Variable M SD M SD M SD M SD M SD F R2
Science achievement
Unweighted 489.85 99.98 500.65 92.93 553.11 89.23 514.32 92.55 474.79 98.38 15.56∗ 0.11
Weighted 506.80 102.49 502.93 92.43 516.84 94.24 506.72 96.04 497.57 100.16 12.69∗ 0.01
Interest in science
Unweighted 444.57 104.00 464.43 97.09 465.29 92.43 478.90 102.34 501.22 102.63 146.85∗ 0.05
Weighted 455.85 103.87 466.99 91.90 473.61 94.17 478.35 100.65 492.09 98.25 86.97∗ 0.02
Support for scientific inquiry
113.69∗
Figure 2. The effect of the level of inquiry teaching on student science achievement after weighting
Figure 3. The effect of the level of inquiry teaching on student interest in science after weighting
highest student interest in science in 47 countries and the highest student support for
scientific inquiry in 42 countries.
Discussion
Level of Inquiry and Content Acquisition
The propensity score analysis reported here reveals that there were significant effects
of the level of openness in inquiry teaching on student science achievement and atti-
tudes toward science. In itself, this is likely not a surprising result, but those effects
Effects of Inquiry Teaching 571
Figure 4. The effect of the level of inquiry teaching on student support for scientific inquiry after
weighting
show different patterns on different outcome measures. For instance, with respect to
student science achievement, the highest score was achieved at Level 2 inquiry teach-
ing, where students frequently conduct activities and draw conclusions from data. For
student science attitudes (interest in science and student support for scientific
inquiry), higher outcome scores were achieved when the level of inquiry teaching
was increased. These conclusions generally held in most of the PISA 2006 participat-
ing countries thus providing a measure of reliability in the result.
Previous meta-analyses have demonstrated that inquiry teaching has inconsistent
effects on student science achievement. Some studies have shown positive effects,
and others have shown negative effects. The study reported here has shown that the
complexity of the impact of inquiry teaching on student science learning is related
to the level of openness in inquiry teaching. The effect of inquiry teaching is somewhat
complicated because our results show that increasing the level of openness in inquiry
teaching is not always beneficial to student science learning. Here, the students’
highest science achievement is obtained when they are involved in conducting activi-
ties and drawing conclusions from data only, rather than in higher level inquiry activi-
ties such as designing the investigation or raising their own questions. This pattern
was also identified in Minner et al.’s (2010) meta-analysis. In addition, our finding
adds to the evidence contrary to Kirschner et al.’s (2006) assertion that direct instruc-
tion is better than inquiry teaching, although we have already offered a cogent criti-
cism of the definition of inquiry used in their report.
Please note that the PISA science assessment items focus on students’ understand-
ing of science content, but are not designed to explicitly examine students’ epistemic
knowledge or understanding of the nature of science. Therefore, while it would be
interesting to gauge this domain directly, it is not possible to link the level of
inquiry to such aspects of student understanding through our analysis.
572 F. Jiang and W.F. McComas
Table 10. Percent of student reports at each inquiry teaching level and mean sequences of science
achievement and attitudes for all participating countries
Percent of student
reports Mean sequence
Interest Support
Science in for scientific
Country n L0 L1 L2 L3 L4 achievement science inquiry
(Continued)
Effects of Inquiry Teaching 573
Percent of student
reports Mean sequence
Interest Support
Science in for scientific
Country n L0 L1 L2 L3 L4 achievement science inquiry
levels of inquiry. The key is to know how and when to apply any instructional
modality. The results of this study seem clear that it is wise to spend most of classroom
instructional time at Level 2 of inquiry teaching (students conduct hands-on activities
and draw conclusion from data). This will ensure that students learn science content
and gain insights about science.
Perhaps, the most open-ended inquiry activities in which students select their own
research topics and design their own investigations should be used sparingly as after-
class projects or in science fair situations while devoting classroom instruction time on
projects at low levels of inquiry. Please note that this study does not suggest that
inquiry has no place in science instruction, as Kirschner et al. (2006) might suggest
with their unsophisticated view of this important teaching technique. Rather, the
proper level of inquiry should be applied with knowledge of the strengths and limit-
ations for each level.
Another useful outcome of this study is to have demonstrated the use of propensity
score analysis for multi-valued treatment, a fairly new technique, in educational
research. Based on the propensity score analysis, the selection bias was controlled
for observed covariates and more valid causal inference was established to estimate
the effects of inquiry teaching on different components of student science achieve-
ment and attitudes. However, our current analysis should be seen as preliminary
and we are humble to make causal claims. To establish more credible causal inference,
the next stage of analysis will involve sensitive analysis on unobserved covariates.
Finally, we conclude with a word about a central limitation inherent in the current
analysis. This limitation links to the lack of the evidence of the validity of student
report on the use of teaching strategies. We fully recognize that we are taking a face
value, the students’ self-reported data. How students interpreted the questions as
they responded to them is a major source of potential error in this analysis.
Without careful examination, it is questionable whether such student reports accu-
rately represent the reality of the teaching practices they witnessed. In addition, con-
straints of the PISA data made it impossible to examine the teacher effect so we are
forced to report that we can make no statement about whether the effects of
inquiry teaching are due to the differences among the teachers. Nevertheless, the find-
ings of this study suggest that propensity score analysis is a powerful new tool to apply
in educational research settings and that inquiry is as rich and complex a pedagogical
tool as the best science teachers have known for generations.
Disclosure statement
No potential conflict of interest was reported by the authors.
Funding
This research was supported by a grant from the American Educational Research
Association which receives funds for its ‘AERA Grants Program’ from the National
Effects of Inquiry Teaching 575
References
Anderson, R. D. (2007). Inquiry as an organizing theme for science curricula. In S. K. Abell & N. G.
Lederman (Eds.), Handbook of research on science education (pp. 807–830). Mahwah, NJ: Lawr-
ence Erlbaum Associates.
Bredderman, T. (1983). Effects of activity-based elementary science on student outcomes: A quan-
titative synthesis. Review of Educational Research, 53(4), 499 –518.
Caliendo, M., & Kopeinig, S. (2008). Some practical guidance for the implementation of propensity
score matching. Journal of Economic Surveys, 22(1), 31–72.
Duschl, R. A. (2003). Assessment of inquiry. In J. M. Atkin & J. Coffey (Eds.), Everyday assessment
in the science classroom (pp. 41–59). Arlington, VA: NSTA Press.
Fan, X., & Nowell, D. L. (2011). Using propensity score matching in educational research. Gifted
Child Quarterly, 55(1), 74–79.
Frölich, M. (2004). Programme evaluation with multiple treatments. Journal of Economic Surveys,
18(2), 181–224.
Furtak, E. M., Seidel, T., Iverson, H., & Briggs, D. (2009). Recent experimental studies of inquiry-
based teaching: A meta-analysis and review. Presented at the 13th Biennial Conference of the
European Association for Research in Learning and Instruction, Amsterdam, Netherlands.
Herron, M. D. (1971). The nature of scientific enquiry. The School Review, 79(2), 171–212.
Hmelo-Silver, C. E., Duncan, R. G., & Chinn, C. A. (2007). Scaffolding and achievement in
problem-based and inquiry learning: A response to Kirschner, Sweller, and Clark (2006). Edu-
cational Psychologist, 42(2), 99– 107.
Hong, G. (2012). Marginal mean weighting through stratification: A generalized method for eval-
uating multivalued and multiple treatments with nonexperimental data. Psychological Methods,
17(1), 44–60.
Imbens, G. W. (2000). The role of the propensity score in estimating dose-response functions. Bio-
metrika, 87(3), 706– 710.
Jiang, F., & McComas, W. F. (2012). The effects of level of openness in inquiry teaching on student science
achievement and attitudes: Evidence from propensity score analysis with PISA 2006 U.S. data. Paper
presented at the Annual Conference of National Association of Research in Science Teaching,
Indianapolis, IN.
Kirschner, P. A., Sweller, J., & Clark, R. E. (2006). Why minimal guidance during instruction does
not work: An analysis of the failure of constructivist, discovery, problem-based, experiential,
and inquiry-based teaching. Educational Psychologist, 41(2), 75–86.
Lechner, M. (2001). Identification and estimation of causal effects of multiple treatments under the
conditional independence assumption. In M. Lechner & F. Pfeiffer (Eds.), Econometric evalu-
ation of labour market policies (Vol. 13, pp. 43–58). Heidelberg: Springer.
Lott, G. W. (1983). The effect of inquiry teaching and advance organizers upon student outcomes in
science education. Journal of Research in Science Teaching, 20(5), 437 –451.
Minner, D. D., Levy, A. J., & Century, J. (2010). Inquiry-based science instruction—what is it and
does it matter? Results from a research synthesis years 1984 to 2002. Journal of Research in
Science Teaching, 47(4), 474 –496.
National Research Council. (1996). National science education standards. Washington, D.C.: The
National Academies Press.
Organisation for Economic Co-operation and Development. (2006). Assessing scientific, reading and
mathematical literacy: A framework for PISA 2006. Paris: OECD Publishing.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational
studies for causal effects. Biometrika, 70(1), 41–55.
576 F. Jiang and W.F. McComas
Schroeder, C. M., Scott, T. P., Tolson, H., Huang, T.-Y., & Lee, Y. (2007). A meta-analysis of
national research: Effects of teaching strategies on student achievement in science in the
United States. Journal of Research in Science Teaching, 44(10), 1436– 1460.
Schwab, J. J. (1962). The teaching of science as enquiry. In J. J. Schwab & P. F. Brandwein (Eds.),
The teaching of science (pp. 1–103). Cambridge, MA: Harvard University Press.
Shulman, L. S., & Tamir, P. (1973). Research on teaching in the natural sciences. In R. M. W.
Travers (Ed.), Second handbook of research on teaching: A project of the American educational
research association (pp. 1098–1148). Chicago, IL: Rand McNally.
Shymansky, J. A., Hedges, L. V., & Woodworth, G. (1990). A reassessment of the effects of inquiry-
based science curricula of the 60’s on student performance. Journal of Research in Science Teach-
ing, 27(2), 127 –144.
Shymansky, J. A., Kyle Jr. W. C., & Alport, J. M. (1983). The effects of new science curricula on
student performance. Journal of Research in Science Teaching, 20(5), 387– 404.
Spreeuwenberg, M. D., Bartak, A., Croon, M. A., Hagenaars, J. A., Busschbach, J. J. V., Andrea, H.,
. . . Twisk, J. (2010). The multiple propensity score as control for bias in the comparison of more
than two treatment arms: An introduction from a case study in mental health. Medical Care, 48,
166– 174.
Stuart, E. A., & Rubin, D. B. (2008). Matching methods for causal inference: Designing observa-
tional studies. In J. Osborne (Ed.), Best practices in quantitative methods (pp. 155– 176). London:
Sage.
Treagust, D. F. (2007). General instructional methods and strategies. In S. K. Abell & N. G. Leder-
man (Eds.), Handbook of research on science education (pp. 373 –391). Mahwah, NJ: Lawrence
Erlbaum Associates.
Weinstein, T., Boulanger, F. D., & Walberg, H. J. (1982). Science curriculum effects in high school:
A quantitative synthesis. Journal of Research in Science Teaching, 19(6), 511 –522.
Windschitl, M. (2008). What is inquiry? A framework for thinking about authentic scientific prac-
tice in the classroom. In J. Luft, R. L. Bell, & J. Gess-Newsome (Eds.), Science as inquiry in the
secondary setting (pp. 1–20). Arlington, VA: NSTA Press.
Wise, K. C., & Okey, J. R. (1983). A meta-analysis of the effects of various science teaching strat-
egies on achievement. Journal of Research in Science Teaching, 20(5), 419– 435.