You are on page 1of 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/272412391

Scientific Reasoning in Higher Education

Article  in  Zeitschrift für Psychologie · January 2015


DOI: 10.1027/2151-2604/a000199

CITATIONS READS

27 1,245

4 authors, including:

Annette Upmeier zu Belzen


Humboldt-Universität zu Berlin
96 PUBLICATIONS   546 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Modeling and Assessment of Scientific Reasoning and Thinking in Chemistry (VerE) View project

Development and Validation of a Test Instrument to assess Strategies of Data-based Reasoning in the Context of Ecosystem Dynamics View project

All content following this page was uploaded by Annette Upmeier zu Belzen on 19 October 2015.

The user has requested enhancement of the downloaded file.


Original Article

Scientific Reasoning in
http://econtent.hogrefe.com/doi/pdf/10.1027/2151-2604/a000199 - Monday, October 19, 2015 4:37:14 AM - Humboldt-Universitaet zu Berlin IP Address:141.20.212.210

Higher Education
Constructing and Evaluating the Criterion-Related
Validity of an Assessment of Preservice Science
Teachers’ Competencies
Stefan Hartmann,1 Annette Upmeier zu Belzen,1 Dirk Krüger,2
and Hans Anand Pant3
1
Biology Education, HU Berlin, Germany, 2Biology Education, FU Berlin, Germany, 3Institute for
Quality Development in Education, HU Berlin, Germany

Abstract. The aim of this study was to develop a standardized test addressed to measure preservice science teachers’ scientific reasoning skills,
and to initially evaluate its psychometric properties. We constructed 123 multiple-choice items, using 259 students’ conceptions to generate
highly attractive multiple-choice response options. In an item response theory-based validation study (N = 2,247), we applied multiple
regression analyses to test hypotheses based on groups with known attributes. As predicted, graduate students performed better than
undergraduate students, and students who studied two natural science disciplines performed better than students who studied only one natural
science discipline. In contrast to our initial hypothesis, preservice science teachers performed less well than a control group of natural sciences
students. Remarkably, an interaction effect of the degree program (bachelor vs. master) and the qualification (natural sciences student vs.
preservice teacher) was found, suggesting that preservice science teachers’ learning opportunities to explicitly discuss and reflect on the inquiry
process have a positive effect on the development of their scientific reasoning skills. We conclude that the evidence provides support for the
criterion-based validity of our interpretation of the test scores as measures of scientific reasoning competencies.

Keywords: scientific reasoning skills, preservice teachers, assessment

Methods of scientific inquiry, such as experimenting and example, at German universities preservice science teach-
modeling, play a key role in scientific practice as well as ers’ curricula partly overlap with the curricula of natural sci-
in science classrooms (American Association for the ences students during the undergraduate phase of academic
Advancement of Science, 1993; Bybee, 2002; National training (e.g., Der Präsident der Humboldt-Universität zu
Research Council, 2012; Popper, 2003). To become compe- Berlin, 2007a, 2007b, 2007c). In addition to science lectures,
tent scientists, science students need to develop a deeper future biology, chemistry, and physics teachers attend semi-
understanding about ‘‘the characteristics of the scientific nars on teaching and instruction. In these seminars, scientific
enterprise and processes through which scientific knowl- inquiry methods are taught as instruction techniques that can
edge is acquired’’ (Schwartz, Lederman, & Crawford, be used in science classrooms (Liu, 2010).
2004, p. 611). How do theories work? What is the purpose The importance of inquiry methods and scientific rea-
of a scientific model? What is a good research question? soning skills in academic programs for science students
How are hypotheses tested? What inferences can be drawn and preservice science teachers leads to the questions:
from empirical data? – but not only scientists should be able How competent are science students and preservice science
to answer these questions: To teach subjects like biology, teachers in the field of scientific reasoning? How do their
chemistry, and physics effectively in school classrooms, sci- competencies develop during the phase of academic train-
ence teachers too need to develop a conceptual understand- ing? What differences can be found between students of dif-
ing of the scientific method, and to acquire competencies in ferent academic education programs?
this area (Liu, 2010; Schwartz et al., 2004). In many teacher To answer these questions, we analyze the development
education programs, preservice science teachers therefore of science students’ and preservice science teachers’
have to attend several academic lectures and practical competencies in the area of scientific reasoning. This article
classes that are held for natural sciences students. As an is about the first two stages of the project, the construction

 2015 Hogrefe Publishing Zeitschrift für Psychologie 2015; Vol. 223(1):47–53


DOI: 10.1027/2151-2604/a000199
48 S. Hartmann et al.: Scientific Reasoning in Higher Education

of a new test of scientific reasoning competencies and the (Giere et al., 2006, p. 6). In fact, both practical work (labora-
evaluation of its criterion-related validity. We give a brief tories) and theoretical lessons (lectures) proved to be effective
overview of the theoretical concept of scientific reasoning, ways to support the development of scientific reasoning com-
describe the construction process of a standardized test, petencies in higher education (Lawson et al., 2000). Duschl &
and present and discuss the findings of a study in which Grandy (2013) demonstrated that the most effective method
http://econtent.hogrefe.com/doi/pdf/10.1027/2151-2604/a000199 - Monday, October 19, 2015 4:37:14 AM - Humboldt-Universitaet zu Berlin IP Address:141.20.212.210

we investigated three aspects of criterion-related validity of to improve students’ scientific reasoning skills are lectures
the test. In a third stage of the project, the test will be used that ‘‘explicitly . . . involve building and refining questions,
to assess the development of students’ competencies longitu- measurements, representations, models and explanations’’
dinally from the beginning to the end of academic training. (p. 2126). According to Hodson (2014), ‘‘there are major
advantages in addressing scientific inquiry [and] making this
understanding explicit to students’’ (p. 9).
Scientific Reasoning Competencies Empirical studies in higher education showed that aca-
demic programs with longer and broader phases of special-
Domain-specific cognitive skills or ‘‘competencies’’ (see ized scientific training result in higher scientific reasoning
Blömeke, Gustafsson, & Shavelson, 2015) are often charac- skills (Kunz, 2012). Students adjust their reasoning strate-
terized as ‘‘dispositions that are acquired and needed to suc- gies to the specific conditions of different content areas,
cessfully cope with certain situations or tasks’’ (Koeppen, and their overall skills increase as they proceed from
Hartig, Klieme, & Leutner, 2008, p. 62). Within the domain domain to domain (Glaser, Schauble, Raghavan, & Zeitz,
of the natural sciences, the competencies that are needed to 1992). These skills are highly generalizable across different
acquire scientific knowledge have been referred to as scientific disciplines (Godfrey-Smith, 2003).
scientific thinking (Kuhn, Amsel, & O’Loughlin, 1988), According to these findings, three aspects have a positive
scientific reasoning (Giere, Bickle, & Mauldin, 2006; effect on the development of scientific reasoning competen-
Klahr, 2000; Koslowski, 1996), and scientific inquiry skills cies: The number of learning opportunities, the variety of
(Burns, Okey, & Wise, 1985; Liu, 2010). Following the content domains in which scientific reasoning is trained,
hypothetico-deductive approach (Brody, de la Peña, & and teaching methods that explicitly address the inquiry pro-
Hodgson, 1993; Godfrey-Smith, 2003; Popper, 2003), com- cess and the Nature of Science – in Hodson’s (2014) terms:
mon inquiry techniques of natural sciences are observing, ‘‘learning about science’’ rather than ‘‘learning science.’’
comparing, experimenting, and modeling (Crawford & As the number of learning opportunities increases by the
Cullin, 2005; Gott & Duggan, 1998; Grosslight, Unger, time students spent at universities, we conclude that science
Jay, & Smith 1991; Justi & Gilbert, 2003; Klahr, 2000; students’ and preservice science teachers’ competencies in
Zimmermann, 2007). ‘‘There is a general pattern to all sci- scientific reasoning undergo significant progress during the
entific reasoning’’ (Giere et al., 2006, p. 6), which can be phase of academic training. As learners’scientific reasoning
defined as a context-specific form of problem-solving. skills increase as they proceed through different content
Mayer (2007) proposes a structural model of scientific rea- domains, learning opportunities in different natural sciences
soning that describes this pattern as a set of four subskills: should also lead to increased scientific reasoning competen-
Formulating research questions, generating hypotheses, cies. Finally, preservice science teachers, who learn about
planning investigations, analyzing and interpreting data science by attending seminars in which they explicitly dis-
(see also Liu, 2010). This model refers to the method of cuss and reflect on the inquiry process and the nature of sci-
experimenting, but can also be applied to observing, com- ence, should perform better in a scientific reasoning test
paring, and modeling (Wellnitz, Hartmann, & Mayer, than natural sciences students, who do not receive this kind
2010). However, certain aspects of scientific modeling are of training.
not covered by this model. In the study presented in this
paper, Mayer’s structural model is therefore extended by
three subskills of a scientific reasoning framework that refers Hypotheses
to the inquiry method of scientific modeling (Krell, Upmeier
zu Belzen, & Krüger, 2014). These subskills are defining the The long-term goal of our project is to evaluate the
purpose of models, testing models, and changing models. development of the scientific reasoning competencies of
Combining the subskills of both frameworks, seven aspects preservice science teachers. After a standardized test was
of scientific reasoning have been identified, each of which constructed (see Method section of this article), we con-
marks one step of a general inquiry process. Being compe- ducted a cross-sectional study to initially evaluate the test’s
tent in the field of scientific reasoning is defined as a cogni- psychometric properties. In this study, criterion-based valid-
tive disposition that enables students to apply each of these ity was tested using the known-groups method (Cronbach
seven steps to real-life scientific problems. & Meehl, 1955; Hattie & Cooksey, 1984; Rupp & Pant,
2006). Following this approach, we formulated three
hypotheses that predict differences of the test outcome
Prior Research between groups with known attributes:
Hypothesis (1): Graduate students perform better
‘‘The obvious way to learn about scientific reasoning is by than undergraduate students, reflecting the positive
learning to be a scientist [but] this is not . . . the best way’’ effect of learning opportunities in academic training.

Zeitschrift für Psychologie 2015; Vol. 223(1):47–53  2015 Hogrefe Publishing


S. Hartmann et al.: Scientific Reasoning in Higher Education 49

Hypothesis (2): Students who study two natural distractors. The transfer of students’ conceptions into cor-
sciences perform better than students whose learning rect and incorrect answering options is expected to improve
opportunities focus on one natural science, reflecting content validity, and in a test constructed this way, not only
the positive effect of learning opportunities across the correct responses but also the ‘‘distractors match com-
different content domains. mon student ideas’’ (Sadler, 1998, p. 265).
http://econtent.hogrefe.com/doi/pdf/10.1027/2151-2604/a000199 - Monday, October 19, 2015 4:37:14 AM - Humboldt-Universitaet zu Berlin IP Address:141.20.212.210

For 166 items, one correct and three incorrect options


Hypothesis (3): preservice science teachers perform could be generated from the students’ answers. In case of
better than a control group of natural sciences the remaining items, the answers given by the students were
students, reflecting the positive effect of learning not sufficient to formulate the required number of multiple-
opportunities to explicitly discuss and reflect on the choice options. These items were not used for further
inquiry process. testing. To ensure that the correct multiple-choice options
adequately represent the competencies described in the
manual, all items underwent a final revision by the experts,
thus enhancing content validity as well as face validity.
The psychometric properties of the 166 items were ini-
tially tested in a pilot study (N = 578). Based on item dif-
Method ficulty, discrimination parameters, and item characteristic
curves, 123 items were selected for the final test.
To assess students’ scientific reasoning competencies, we
developed a standardized paper-and-pencil test. Three item
developers were provided with conceptual guidelines in the Booklet Design and Sample
form of a test construction manual. The manual contained a
detailed competence description for each of the seven sub- The 123 items were distributed across 41 item blocks.
skills, alongside sample items. Following our framework of An unbalanced incomplete matrix design (Gonzalez &
scientific reasoning, we constructed test items for each Rutkowski, 2010) was used to assign the blocks to 20 test
subskill (formulating questions, generating hypotheses, booklets. Each booklet contained 6 blocks (18 items), two
planning investigations, interpreting data, judging the pur- blocks for each content domain (biology, chemistry, phys-
pose of models, testing models, and changing models). ics). All booklets contained items for each of the seven
About one third of the items refers to real-life scientific subskills. The booklets were used in a cross-sectional study
problems from the field of biology, one third to chemistry, with 2,247 participants (55% female) at universities in
and one third to physics. To increase face validity as well as Germany and Austria.1 The sample consisted of 1,096
content validity, seven subject-matter experts and a psycho- preservice science teachers in academic training and
metric consultant supervised the item construction process. 1,151 natural sciences students. Two hundred thirteen
Their supervision included critical discussion of the rele- students studied two natural sciences (e.g., biology and
vance and the representativeness of each item to the frame- physics). Students were on average 22.45 years old
work described in the test manual, as well as suggestions for (SD = 4.28). Three hundred fifty-five students were gradu-
improvement. All items underwent several revisions, and ates. Differential Item Functioning (DIF) analyses (Adams,
only items that eventually received no further comments, Wilson, & Wu, 1997) were conducted to ensure that the
suggestions, or objections from the experts were approved items measure the same underlying trait for each group of
to be used in the test. participants. For the vast majority of items, only negligible
DIF was found.

Test Construction
In a first stage, we constructed 183 open-ended test items. Results
A sample of 259 preservice science teachers and natural-
sciences students processed these items to generate written Item Parameters and Test Reliability
answers that reflect their conceptions of scientific reason-
ing. The test developers and the seven subject-matter ACER ConQuest 3.0 (Adams, Wu, & Wilson, 2012) was
experts categorized the written responses into scientifically used for data analysis. Plausible values (Wu, 2005) were
adequate and non-adequate conceptions. The students’ drawn to estimate the students’ abilities. On average, the
answers were then used to formulate multiple-choice students responded to 17.42 test items (SD = 1.88).
options. Answers that reflected the cognitive processes The range of person abilities was covered well by items
described in the construction manual were used as correct with matching difficulties. Item parameter estimates range
options, whereas alternative conceptions were used as from 2.81 to 1.64. We found acceptable infit MNSQs,

1
Students of the following universities participated in the study: FU Berlin, HU Berlin, RWTH Aachen, TU Berlin, University of Bremen,
University of Duisburg-Essen, University of Cologne, University of Potsdam (all Germany), University of Innsbruck, University of
Salzburg, and University of Vienna (all Austria).

 2015 Hogrefe Publishing Zeitschrift für Psychologie 2015; Vol. 223(1):47–53


50 S. Hartmann et al.: Scientific Reasoning in Higher Education

ranging from 0.93 to 1.09. The EAP/PV reliability estimate


of the test was 0.544.

Differences Between Known Groups


http://econtent.hogrefe.com/doi/pdf/10.1027/2151-2604/a000199 - Monday, October 19, 2015 4:37:14 AM - Humboldt-Universitaet zu Berlin IP Address:141.20.212.210

To test our hypotheses, a latent regression model was


applied (Adams et al., 1997). Latent regression allows to
examine group differences in the means of the ability
parameters directly in the IRT model. Three independent
variables were included as predictors of the latent variable.
Degree program (0 = undergraduate program; 1 = graduate
program) reflects the number of learning opportunities:
Graduate students have had more learning opportunities
than undergraduate students. Discipline (0 = one natural
science discipline; 1 = two natural science disciplines)
reflects the comprehensive scope of the learning contents:
Students who study two natural science disciplines have
to adopt their reasoning skills to a broader variety of
content domains. Qualification (0 = science student;
1 = preservice science teacher) was added to test for the Figure 1. Mean achievement scores of natural sciences
effect of seminars and lectures in which the inquiry process students and preservice science teachers, broken down by
and the nature of science are extensively discussed. These the phase of academic training (undergraduate vs. grad-
seminars are mandatory for preservice science teachers in uate). Error bars indicate ± 2 SE.
academic training at German universities. Natural sciences
students have been added as a control group, because their
academic training is similar to the training of science teach-
ers, but they do not visit such seminars.
Analyses have indicated no multicollinearity of the data. qualification reflects the difference between preservice sci-
Results of the regression analysis are shown in Table 1. ence teachers’ and science students’ abilities. We found a
Our first hypothesis predicted that graduate students very small but significant negative effect of the variable,
perform better than undergraduate students. The regression meaning that preservice science teachers performed fairly
effect of the variable degree program reflects the achieve- lower than science students (Table 1).
ment of graduate students in comparison to the achievement Figure 1 shows the effect broken down by the phase of
of undergraduate students. We found a significant effect for academic training. In the undergraduate phase, the mean
the variable, indicating higher achievement scores for the achievement scores of preservice science teachers were
group of graduate students (Table 1). Our second hypothesis lower than the mean achievement scores of science stu-
stated that students who study two natural science disci- dents, whereas the opposite effect was found in the graduate
plines perform better than students who study one natural phase. If the interaction between degree program and qual-
science. We found a significant effect of the variable disci- ification is included in the regression model, it shows a sig-
pline, indicating higher achievement for students who study nificant effect (Table 2).
two natural sciences (Table 1). Our third hypothesis pre-
dicted that preservice science teachers perform better than
science students. The regression effect of the variable
Table 2. Latent regression of qualification, discipline,
degree program, and the interaction of qualifi-
cation and degree program on the achievement
Table 1. Latent regression of qualification, discipline, and scale (predictor variables, unstandardized
degree program on the achievement scale (pre- regression coefficients, standard errors)
dictor variables, unstandardized regression coef-
ficients, standard errors) Predictor variable B SE (B)
Qualification (1 = preservice 0.141** 0.023
Predictor variable B SE (B)
science teacher)
Qualification (1 = preservice 0.107** 0.021 Discipline (1 = two natural 0.113* 0.036
science teacher) science disciplines)
Discipline (1 = two natural 0.116* 0.035 Degree program (1 = graduate) 0.151* 0.045
sciences disciplines) Qualification · Degree program 0.218** 0.057
Degree program (1 = graduate) 0.283** 0.028 (1 = graduate preservice teacher)
Note. *p < .01. **p < .001. Note. *p < .01. **p < .001.

Zeitschrift für Psychologie 2015; Vol. 223(1):47–53  2015 Hogrefe Publishing


S. Hartmann et al.: Scientific Reasoning in Higher Education 51

Discussion the preservice teachers performed lower than the science


students only in the undergraduate phase, but higher in
Overall, the findings were in accordance with the hypothe- the graduate phase of academic training. When the interac-
sized group differences, thus providing evidence from tion of the variables qualification and degree program is
which initial validity statements can be drawn. In our first added to the regression model, it shows a significant effect.
http://econtent.hogrefe.com/doi/pdf/10.1027/2151-2604/a000199 - Monday, October 19, 2015 4:37:14 AM - Humboldt-Universitaet zu Berlin IP Address:141.20.212.210

hypothesis, we predicted that graduate students perform A possible explanation lies within the graduate curricula of
better than undergraduate students. We found a significant teacher education programs: Most of the seminars in which
regression effect of the variable degree program, supporting scientific reasoning and the nature of science are taught
the hypothesis. This finding is in accordance with the explicitly are held during the graduate phase of academic
increase of scientific reasoning skills during academic training (Der Präsident der Humboldt-Universität zu Berlin,
training that has been described in other studies (Glaser 2007a). Therefore, the impact of these learning opportuni-
et al., 1992; Kunz, 2012; Lawson et al., 2000). As the group ties only shows for graduate students, but not for undergrad-
means differ in the expected directions, we conclude that uates. This finding is in accordance with our hypothesis.
this evidence provides support for the validity of our inter- Again, this interpretation is hypothetical and needs longitu-
pretation of the test scores as measures of scientific reason- dinal assessment to be tested. The next stage of this study
ing. However, one can think of alternative explanations for will be used to further investigate this result.
this result. Scientific reasoning is not the only skill that The results also provide evidence to rate the psychomet-
develops during the phase of academic training. As the test ric properties of the final test instrument. The range of per-
items refer to problems and phenomena of the natural sci- son abilities was covered well by items with matching
ences, the group differences could indicate an increase of parameter estimates, indicating that the test was neither
a more general skill from this area, such as content knowl- too easy nor too difficult for the tested sample. All item
edge. On the other hand, we believe that this risk was min- infit MNSQs were close to the expected value of 1.00
imized by the extensive involvement of subject-matter (Bond & Fox, 2007). The item with the highest overfit
experts in the item construction process. Another alterna- had an infit MNSQ of 0.93, the infit of the item with the
tive explanation lies within the tested sample: Graduates highest underfit was 1.09. Overall, these results indicate a
can be seen as a selection of students who have been good model-data fit, with only minor redundancies in the
high-achieving during the phase of undergraduate educa- responses. In terms of item fit, we conclude that all 123
tion, therefore biasing the results of a cross-sectional items are suitable for measurement.
comparison of both groups. To test for competence devel- The EAP/PV reliability of the test was 0.544. Even
opment, longitudinal designs are needed. In the next stage though this value is lower than for most psychological tests,
of our project, we will conduct a longitudinal study to test it matches well with the reliabilities of standardized tests for
preservice science teachers’ competencies four times scientific reasoning competencies (Mannel, 2011: 0.23–
during the phase of academic training. 0.66; Neumann, 2011: 0.55; Terzer, 2013: 0.46; Wellnitz,
The second hypothesis stated that students who study 2012: 0.59). Yet, the EAP/PV reliability ‘‘seems to be of
two natural sciences perform better than students who only limited use as an index of measurement instrument quality’’
study one natural science. Results of the regression analysis (Adams, 2005, p. 170), and low reliabilities ‘‘can be com-
showed a positive regression coefficient for the variable dis- pensated for by larger samples’’ (Adams, 2006, p. 25).
cipline, supporting the hypothesis. This finding is in accor- In studies that focus on the estimation of population param-
dance with prior studies which found that a broader variety eters instead of the abilities of individual students, it is
of learning opportunities lead to an increase in scientific therefore ‘‘not uncommon . . . to implement tests that have
reasoning skills (Glaser et al., 1992; Kunz, 2012; Lawson low reliability’’ (Adams, 2005, p. 170).
et al., 2000). Again, the group differences were in the
expected direction, therefore supporting our interpretation
of the test scores as measures of scientific reasoning. Conclusion
The third hypothesis accounted for the learning opportu-
nities in academic programs for preservice science teachers. The aim of this study was to construct a standardized test
In seminars on teaching and instruction, preservice teachers addressed to examine preservice science teachers’ compe-
learn about science (Hodson, 2014) by discussing and tencies in the field of scientific reasoning, and to initially
reflecting on the inquiry process and the nature of science. evaluate its psychometric properties. We conducted a
That way, competencies in the field of scientific reasoning cross-sectional validation study (N = 2,247) in which we
are explicitly trained. If our test measures such competen- tested three hypotheses that predict differences between
cies, students who have had these specific learning oppor- known groups. Three lines of evidence supported the valid-
tunities should perform better in the test than students ity of our interpretation of the test results as measures of
who haven’t had such opportunities (Duschl & Grandy, scientific reasoning competencies: First, graduate students
2013; Giere et al., 2006). The results of the initial regres- performed better than undergraduate students, reflecting
sion analysis did not support this hypothesis: Overall, a neg- the positive effect of the increasing number of learning
ative effect was found, indicating that preservice science opportunities during the time students spent at the univer-
teachers perform slightly lower than natural sciences stu- sity. Second, students who study two natural sciences
dents. Broken down by degree (undergraduate vs. graduate), performed better than students who study one natural

 2015 Hogrefe Publishing Zeitschrift für Psychologie 2015; Vol. 223(1):47–53


52 S. Hartmann et al.: Scientific Reasoning in Higher Education

science, reflecting the positive effect of learning opportuni- Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model:
ties that students receive as they proceed across different Fundamental measurement in the human sciences. London,
content domains within the natural sciences. The third UK: Erlbaum.
Brody, T. A., de la Peña, L., & Hodgson, P. E. (1993).
hypothesis predicted that preservice science teachers per- The philosophy behind physics. Berlin, Germany: Springer.
form better than a control group of natural sciences stu- Burns, J. C., Okey, J. R., & Wise, K. C. (1985). Development of
http://econtent.hogrefe.com/doi/pdf/10.1027/2151-2604/a000199 - Monday, October 19, 2015 4:37:14 AM - Humboldt-Universitaet zu Berlin IP Address:141.20.212.210

dents. The hypothesized effect was only found for an integrated process skill test: TIPS II. Journal of Research
graduate students, but not for undergraduates. It reflects in Science Teaching, 22, 169–177.
the positive impact of lectures and seminars in which they Bybee, R. W. (2002). Teaching science as inquiry. In J. Minstrell
explicitly discuss and reflect on the inquiry process. As the & E. H. van Zee (Eds.), Inquiring into inquiry learning and
teaching in science (pp. 20–46). Washington, DC: American
vast majority of these seminars are held during the graduate Association for the Advancement of Science.
phase of academic training, the result supports our hypoth- Crawford, B., & Cullin, M. (2005). Dynamic assessments of
esis. In addition to the empirical findings, the extensive preservice teachers’ knowledge of models and modelling.
involvement of subject-matter experts in the item construc- In K. Boersma, M. Goedhart, O. de Jong, &
tion process, and the use of student’s conceptions to con- H. Eijkelhof (Eds.), Research and the quality of science
struct multiple-choice answering options further improve education (pp. 309–323). Dordrecht, The Netherlands:
the validity by increasing both the relevance and the repre- Springer.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in
sentativeness of the items. psychological tests. Psychological Bulletin, 52, 281–302.
The analyses presented in this paper examined validity Der Präsident der Humboldt-Universität zu Berlin. (Ed.).
rather than reliability. Even though evidence for validity (2007a). Fachübergreifende Studienordnung für das Master-
provides indirect evidence for reliability as well, the low studium für das Lehramt [Interdisciplinary study regulations
EAP/PV reliability found in this study is a subject to ongo- for the Master of Education program]. Berlin, Germany:
ing discussion, and improvements to the booklet design are Humboldt University. Retrieved from http://www.amb.
hu-berlin.de/2007/99/9920070
currently under way. In the upcoming longitudinal phase of Der Präsident der Humboldt-Universität zu Berlin (Ed.).
our project, we will also implement additional tools to eval- (2007b). Studien- und Prüfungsordnung für das Bachelor-
uate different aspects of validity, for example, a thinking- studium Biologie: Kernfach Biologie und Beifach Chemie im
aloud validation study and a multi-trait-multi-method Monostudiengang. Beifach Biologie im Monostudiengang
(MTMM) study, including general cognitive ability, knowl- [Study regulations for the Bachelor program in biology:
edge about the Nature of Science, and an alternative scien- Major in biology and minor in chemistry. Minor in biology].
Berlin, Germany: Humboldt University. Retrieved from
tific reasoning test as covariates. http://www.amb.hu-berlin.de/2007/62/6220070
Overall, the findings led us to the conclusion that the test Der Präsident der Humboldt-Universität zu Berlin (Ed.).
can be used for the upcoming stage of our project, where (2007c). Studien- und Prüfungsordnung für das Bachelor-
scientific reasoning competencies will be tested in a longitu- studium Biologie: Kernfach und Zweitfach im Kombina-
dinal design. This longitudinal study will provide insights tionsstudiengang mit Lehramtsoption [Study regulations for
into the competence development from the beginning to the Bachelor program in biology: Major and minor in a dual
the end of academic training, and help us evaluate and program for preservice teachers]. Berlin, Germany:
Humboldt University. Retrieved from http://www.amb.
improve the contents and structure of science teacher hu-berlin.de/2007/68/6820070
education. Duschl, R. A., & Grandy, R. (2013). Two views about explicitly
teaching the nature of science. Science and Education, 22,
2109–2139.
Giere, R. N., Bickle, J., & Mauldin, R. F. (2006). Understanding
scientific reasoning. Independence, KY: Wadsworth/
References Cengage Learning.
Glaser, R., Schauble, L., Raghavan, K., & Zeitz, C. (1992).
Adams, R. J. (2005). Reliability as a measurement design effect. Scientific reasoning across different domains. In E. Corte,
Studies in Educational Evaluation, 31, 162–172. M. Linn, H. Mandl, & L. Verschaffel (Eds.), Computer-based
Adams, R. J. (2006, April). Reliability and item response learning environments and problem solving (pp. 345–371).
modelling. Myths, observations and applications. Paper Heidelberg, Germany: Springer.
presented at the 13th International Objective Measurement Godfrey-Smith, P. (2003). Theory and reality: An introduction to
Workshop, Berkeley, CA, USA. the philosophy of science. Chicago, IL: University of
Adams, R. J., Wilson, M. R., & Wu, M. L. (1997). Multilevel Chicago Press.
item response models: An approach to errors in variables Gonzalez, E., & Rutkowski, L. (2010). Practical approaches for
regression. Journal of Educational and Behavioral Statistics, choosing multiple-matrix sample designs. IEA-ETS
22, 46–75. Research Institute Monograph, 3, 125–156.
Adams, R. J., Wu, M. L., & Wilson, M. R. (2012). Conquest 3.0 Gott, R., & Duggan, S. (1998). Investigative work in the science
[computer software]. Camberwell, Australia: Australian curriculum. Buckingham, UK: Open University Press.
Council for Educational Research. Grosslight, L., Unger, C., Jay, E., & Smith, C. (1991).
American Association for the Advancement of Science. (1993). Understanding models and their use in science: Conceptions
Benchmarks for science literacy: Project 2061. New York, of middle and high school students and experts. Journal of
NY: Oxford University Press. Research in Science Teaching, 28, 799–822.
Blömeke, S., Gustafsson, J.-E., & Shavelson, R. (2015). Beyond Hattie, J. A., & Cooksey, R. W. (1984). Procedures for assessing
dichotomies: Competence viewed as a continuum. Zeitschrift the validity of tests using the ‘‘known groups’’ method.
für Psychologie, 223. doi: 10.1027/2151-2604/a000194 Applied Psychological Measurement, 8, 295–305.

Zeitschrift für Psychologie 2015; Vol. 223(1):47–53  2015 Hogrefe Publishing


S. Hartmann et al.: Scientific Reasoning in Higher Education 53

Hodson, D. (2014). Learning science, learning about science, Rupp, A. A., & Pant, H. A. (2006). Validity theory. In N. J.
doing science: Different goals demand different learning Salkind (Ed.), Encyclopedia of measurement and statistics
methods. International Journal of Science Education, 36, (pp. 1032–1035). Thousand Oaks, CA: Sage.
2534–2553. doi: 10.1080/09500693.2014.899722 Sadler, P. M. (1998). Psychometric models of student concep-
Justi, R. S., & Gilbert, J. K. (2003). Teachers’ views on the tions in science: Reconciling qualitative studies and distrac-
nature of models. International Journal of Science Educa- tor-driven assessment instruments. Journal of Research in
http://econtent.hogrefe.com/doi/pdf/10.1027/2151-2604/a000199 - Monday, October 19, 2015 4:37:14 AM - Humboldt-Universitaet zu Berlin IP Address:141.20.212.210

tion, 25, 1369–1386. Science Teaching, 35, 265–296.


Klahr, D. (2000). Exploring science. The cognition and devel- Schwartz, R. S., Lederman, N. G., & Crawford, B. A. (2004).
opment of discovery processes. Cambridge, MA: MIT Press. Developing views of nature of science in an authentic
Koeppen, K., Hartig, J., Klieme, E., & Leutner, D. (2008). context: An explicit approach to bridging the gap between
Current issues in competence modeling and assessment. nature of science and scientific inquiry. Science Education,
Zeitschrift für Psychologie, 216, 61–73. 88, 610–645.
Koslowski, B. (1996). Theory and evidence. The development of Terzer, E. (2013). Modellkompetenz im Kontext Biologieunter-
scientific reasoning. Cambridge, MA: MIT Press. richt – Empirische Beschreibung von Modellkompetenz
Krell, M., Upmeier zu Belzen, A., & Krüger, D. (2014). mithilfe von Multiple-Choice Items [Model competence in
Students’ levels of understanding models and modelling in biology education – empirical description of model compe-
biology: Global or aspect-dependent? Research in Science tence using multiple-choice items] (Doctoral dissertation,
Education, 44, 109–132. HU Berlin, Germany). Retrieved from http://edoc.hu-
Kuhn, D., Amsel, E., & O’Loughlin, M. (1988). The develop- berlin.de/dissertationen/terzer-eva-2012-12-19/PDF/terzer.pdf
ment of scientific thinking skills. San Diego, CA: Academic Wellnitz, N. (2012). Kompetenzstruktur und -niveaus von
Press. Methoden der naturwissenschaftlichen Erkenntnisgewin-
Kunz, H. (2012). Professionswissen von Lehrkräften der Natur- nung [Competence structure and levels of methods of
wissenschaften im Kompetenzbereich Erkenntnisgewinnung scientific inquiry]. Berlin, Germany: Logos.
[Science teachers’ professional knowledge in scientific Wellnitz, N., Hartmann, S., & Mayer, J. (2010). Developing a
inquiry]. (Doctoral dissertation, University of Kassel, paper-and-pencil-test to assess students’ skills in scientific
Germany). Retrieved from https://kobra.bibliothek.uni- inquiry. In G. Çakmaki & F. Tassßar (Eds.), Contemporary
kassel.de/bitstream/urn:nbn:de:hebis:34-2012012040403/9/ science education research: Learning and assessment
DissertationHagenKunz.pdf (pp. 289–294). Ankara, Turkey: ESERA.
Lawson, A. E., Clark, B., Cramer-Meldrum, E., Falconer, K. A., Wu, M. (2005). The role of plausible values in large-scale
Sequist, J. M., & Kwon, Y.-J. (2000). Development of surveys. Studies in Educational Evaluation, 31, 114–128.
scientific reasoning in college biology: Do two levels of Zimmermann, C. (2007). The development of scientific thinking
general hypothesis-testing skills exist? Journal of Research skills in elementary and middle school. Developmental
in Science Teaching, 37, 81–101. Review, 27, 172–223.
Liu, X. (2010). Using and developing measurement instruments
in science education. A Rasch modeling approach. Charlotte,
NC: Information Age Publishing.
Mannel, S. (2011). Assessing scientific inquiry. Development
and evaluation of a test for the low-performing stage. Berlin,
Germany: Logos.
Mayer, J. (2007). Erkenntnisgewinnung als wissenschaftliches
Problemlösen [Inquiry as scientific problem solving]. In
D. Krüger & H. Vogt (Eds.), Theorien in der biologiedidak- Stefan Hartmann
tischen Forschung. Ein Handbuch für Lehramtsstudenten und
Doktoranden (pp. 177–186). Berlin, Germany: Springer. HU Berlin
National Research Council. (2012). Education for life and work: Biology Education
Developing transferable knowledge and skills in the 21st Invalidenstrasse 42
century. Washington, DC: National Academy Press. 10115 Berlin
Neumann, I. (2011). Beyond physics content knowledge. Mod- Germany
eling competence regarding nature of science inquiry and Tel. +49 30 2093-98306
nature of scientific knowledge. Berlin, Germany: Logos. Fax +49 30 2093-8311
Popper, K. R. (2003). The logic of scientific discovery. London, E-mail stefan.hartmann@hu-berlin.de
UK: Routledge.

 2015 Hogrefe Publishing Zeitschrift für Psychologie 2015; Vol. 223(1):47–53

View publication stats

You might also like