Professional Documents
Culture Documents
net/publication/272412391
CITATIONS READS
27 1,245
4 authors, including:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Modeling and Assessment of Scientific Reasoning and Thinking in Chemistry (VerE) View project
Development and Validation of a Test Instrument to assess Strategies of Data-based Reasoning in the Context of Ecosystem Dynamics View project
All content following this page was uploaded by Annette Upmeier zu Belzen on 19 October 2015.
Scientific Reasoning in
http://econtent.hogrefe.com/doi/pdf/10.1027/2151-2604/a000199 - Monday, October 19, 2015 4:37:14 AM - Humboldt-Universitaet zu Berlin IP Address:141.20.212.210
Higher Education
Constructing and Evaluating the Criterion-Related
Validity of an Assessment of Preservice Science
Teachers’ Competencies
Stefan Hartmann,1 Annette Upmeier zu Belzen,1 Dirk Krüger,2
and Hans Anand Pant3
1
Biology Education, HU Berlin, Germany, 2Biology Education, FU Berlin, Germany, 3Institute for
Quality Development in Education, HU Berlin, Germany
Abstract. The aim of this study was to develop a standardized test addressed to measure preservice science teachers’ scientific reasoning skills,
and to initially evaluate its psychometric properties. We constructed 123 multiple-choice items, using 259 students’ conceptions to generate
highly attractive multiple-choice response options. In an item response theory-based validation study (N = 2,247), we applied multiple
regression analyses to test hypotheses based on groups with known attributes. As predicted, graduate students performed better than
undergraduate students, and students who studied two natural science disciplines performed better than students who studied only one natural
science discipline. In contrast to our initial hypothesis, preservice science teachers performed less well than a control group of natural sciences
students. Remarkably, an interaction effect of the degree program (bachelor vs. master) and the qualification (natural sciences student vs.
preservice teacher) was found, suggesting that preservice science teachers’ learning opportunities to explicitly discuss and reflect on the inquiry
process have a positive effect on the development of their scientific reasoning skills. We conclude that the evidence provides support for the
criterion-based validity of our interpretation of the test scores as measures of scientific reasoning competencies.
Methods of scientific inquiry, such as experimenting and example, at German universities preservice science teach-
modeling, play a key role in scientific practice as well as ers’ curricula partly overlap with the curricula of natural sci-
in science classrooms (American Association for the ences students during the undergraduate phase of academic
Advancement of Science, 1993; Bybee, 2002; National training (e.g., Der Präsident der Humboldt-Universität zu
Research Council, 2012; Popper, 2003). To become compe- Berlin, 2007a, 2007b, 2007c). In addition to science lectures,
tent scientists, science students need to develop a deeper future biology, chemistry, and physics teachers attend semi-
understanding about ‘‘the characteristics of the scientific nars on teaching and instruction. In these seminars, scientific
enterprise and processes through which scientific knowl- inquiry methods are taught as instruction techniques that can
edge is acquired’’ (Schwartz, Lederman, & Crawford, be used in science classrooms (Liu, 2010).
2004, p. 611). How do theories work? What is the purpose The importance of inquiry methods and scientific rea-
of a scientific model? What is a good research question? soning skills in academic programs for science students
How are hypotheses tested? What inferences can be drawn and preservice science teachers leads to the questions:
from empirical data? – but not only scientists should be able How competent are science students and preservice science
to answer these questions: To teach subjects like biology, teachers in the field of scientific reasoning? How do their
chemistry, and physics effectively in school classrooms, sci- competencies develop during the phase of academic train-
ence teachers too need to develop a conceptual understand- ing? What differences can be found between students of dif-
ing of the scientific method, and to acquire competencies in ferent academic education programs?
this area (Liu, 2010; Schwartz et al., 2004). In many teacher To answer these questions, we analyze the development
education programs, preservice science teachers therefore of science students’ and preservice science teachers’
have to attend several academic lectures and practical competencies in the area of scientific reasoning. This article
classes that are held for natural sciences students. As an is about the first two stages of the project, the construction
of a new test of scientific reasoning competencies and the (Giere et al., 2006, p. 6). In fact, both practical work (labora-
evaluation of its criterion-related validity. We give a brief tories) and theoretical lessons (lectures) proved to be effective
overview of the theoretical concept of scientific reasoning, ways to support the development of scientific reasoning com-
describe the construction process of a standardized test, petencies in higher education (Lawson et al., 2000). Duschl &
and present and discuss the findings of a study in which Grandy (2013) demonstrated that the most effective method
http://econtent.hogrefe.com/doi/pdf/10.1027/2151-2604/a000199 - Monday, October 19, 2015 4:37:14 AM - Humboldt-Universitaet zu Berlin IP Address:141.20.212.210
we investigated three aspects of criterion-related validity of to improve students’ scientific reasoning skills are lectures
the test. In a third stage of the project, the test will be used that ‘‘explicitly . . . involve building and refining questions,
to assess the development of students’ competencies longitu- measurements, representations, models and explanations’’
dinally from the beginning to the end of academic training. (p. 2126). According to Hodson (2014), ‘‘there are major
advantages in addressing scientific inquiry [and] making this
understanding explicit to students’’ (p. 9).
Scientific Reasoning Competencies Empirical studies in higher education showed that aca-
demic programs with longer and broader phases of special-
Domain-specific cognitive skills or ‘‘competencies’’ (see ized scientific training result in higher scientific reasoning
Blömeke, Gustafsson, & Shavelson, 2015) are often charac- skills (Kunz, 2012). Students adjust their reasoning strate-
terized as ‘‘dispositions that are acquired and needed to suc- gies to the specific conditions of different content areas,
cessfully cope with certain situations or tasks’’ (Koeppen, and their overall skills increase as they proceed from
Hartig, Klieme, & Leutner, 2008, p. 62). Within the domain domain to domain (Glaser, Schauble, Raghavan, & Zeitz,
of the natural sciences, the competencies that are needed to 1992). These skills are highly generalizable across different
acquire scientific knowledge have been referred to as scientific disciplines (Godfrey-Smith, 2003).
scientific thinking (Kuhn, Amsel, & O’Loughlin, 1988), According to these findings, three aspects have a positive
scientific reasoning (Giere, Bickle, & Mauldin, 2006; effect on the development of scientific reasoning competen-
Klahr, 2000; Koslowski, 1996), and scientific inquiry skills cies: The number of learning opportunities, the variety of
(Burns, Okey, & Wise, 1985; Liu, 2010). Following the content domains in which scientific reasoning is trained,
hypothetico-deductive approach (Brody, de la Peña, & and teaching methods that explicitly address the inquiry pro-
Hodgson, 1993; Godfrey-Smith, 2003; Popper, 2003), com- cess and the Nature of Science – in Hodson’s (2014) terms:
mon inquiry techniques of natural sciences are observing, ‘‘learning about science’’ rather than ‘‘learning science.’’
comparing, experimenting, and modeling (Crawford & As the number of learning opportunities increases by the
Cullin, 2005; Gott & Duggan, 1998; Grosslight, Unger, time students spent at universities, we conclude that science
Jay, & Smith 1991; Justi & Gilbert, 2003; Klahr, 2000; students’ and preservice science teachers’ competencies in
Zimmermann, 2007). ‘‘There is a general pattern to all sci- scientific reasoning undergo significant progress during the
entific reasoning’’ (Giere et al., 2006, p. 6), which can be phase of academic training. As learners’scientific reasoning
defined as a context-specific form of problem-solving. skills increase as they proceed through different content
Mayer (2007) proposes a structural model of scientific rea- domains, learning opportunities in different natural sciences
soning that describes this pattern as a set of four subskills: should also lead to increased scientific reasoning competen-
Formulating research questions, generating hypotheses, cies. Finally, preservice science teachers, who learn about
planning investigations, analyzing and interpreting data science by attending seminars in which they explicitly dis-
(see also Liu, 2010). This model refers to the method of cuss and reflect on the inquiry process and the nature of sci-
experimenting, but can also be applied to observing, com- ence, should perform better in a scientific reasoning test
paring, and modeling (Wellnitz, Hartmann, & Mayer, than natural sciences students, who do not receive this kind
2010). However, certain aspects of scientific modeling are of training.
not covered by this model. In the study presented in this
paper, Mayer’s structural model is therefore extended by
three subskills of a scientific reasoning framework that refers Hypotheses
to the inquiry method of scientific modeling (Krell, Upmeier
zu Belzen, & Krüger, 2014). These subskills are defining the The long-term goal of our project is to evaluate the
purpose of models, testing models, and changing models. development of the scientific reasoning competencies of
Combining the subskills of both frameworks, seven aspects preservice science teachers. After a standardized test was
of scientific reasoning have been identified, each of which constructed (see Method section of this article), we con-
marks one step of a general inquiry process. Being compe- ducted a cross-sectional study to initially evaluate the test’s
tent in the field of scientific reasoning is defined as a cogni- psychometric properties. In this study, criterion-based valid-
tive disposition that enables students to apply each of these ity was tested using the known-groups method (Cronbach
seven steps to real-life scientific problems. & Meehl, 1955; Hattie & Cooksey, 1984; Rupp & Pant,
2006). Following this approach, we formulated three
hypotheses that predict differences of the test outcome
Prior Research between groups with known attributes:
Hypothesis (1): Graduate students perform better
‘‘The obvious way to learn about scientific reasoning is by than undergraduate students, reflecting the positive
learning to be a scientist [but] this is not . . . the best way’’ effect of learning opportunities in academic training.
Hypothesis (2): Students who study two natural distractors. The transfer of students’ conceptions into cor-
sciences perform better than students whose learning rect and incorrect answering options is expected to improve
opportunities focus on one natural science, reflecting content validity, and in a test constructed this way, not only
the positive effect of learning opportunities across the correct responses but also the ‘‘distractors match com-
different content domains. mon student ideas’’ (Sadler, 1998, p. 265).
http://econtent.hogrefe.com/doi/pdf/10.1027/2151-2604/a000199 - Monday, October 19, 2015 4:37:14 AM - Humboldt-Universitaet zu Berlin IP Address:141.20.212.210
Test Construction
In a first stage, we constructed 183 open-ended test items. Results
A sample of 259 preservice science teachers and natural-
sciences students processed these items to generate written Item Parameters and Test Reliability
answers that reflect their conceptions of scientific reason-
ing. The test developers and the seven subject-matter ACER ConQuest 3.0 (Adams, Wu, & Wilson, 2012) was
experts categorized the written responses into scientifically used for data analysis. Plausible values (Wu, 2005) were
adequate and non-adequate conceptions. The students’ drawn to estimate the students’ abilities. On average, the
answers were then used to formulate multiple-choice students responded to 17.42 test items (SD = 1.88).
options. Answers that reflected the cognitive processes The range of person abilities was covered well by items
described in the construction manual were used as correct with matching difficulties. Item parameter estimates range
options, whereas alternative conceptions were used as from 2.81 to 1.64. We found acceptable infit MNSQs,
1
Students of the following universities participated in the study: FU Berlin, HU Berlin, RWTH Aachen, TU Berlin, University of Bremen,
University of Duisburg-Essen, University of Cologne, University of Potsdam (all Germany), University of Innsbruck, University of
Salzburg, and University of Vienna (all Austria).
hypothesis, we predicted that graduate students perform A possible explanation lies within the graduate curricula of
better than undergraduate students. We found a significant teacher education programs: Most of the seminars in which
regression effect of the variable degree program, supporting scientific reasoning and the nature of science are taught
the hypothesis. This finding is in accordance with the explicitly are held during the graduate phase of academic
increase of scientific reasoning skills during academic training (Der Präsident der Humboldt-Universität zu Berlin,
training that has been described in other studies (Glaser 2007a). Therefore, the impact of these learning opportuni-
et al., 1992; Kunz, 2012; Lawson et al., 2000). As the group ties only shows for graduate students, but not for undergrad-
means differ in the expected directions, we conclude that uates. This finding is in accordance with our hypothesis.
this evidence provides support for the validity of our inter- Again, this interpretation is hypothetical and needs longitu-
pretation of the test scores as measures of scientific reason- dinal assessment to be tested. The next stage of this study
ing. However, one can think of alternative explanations for will be used to further investigate this result.
this result. Scientific reasoning is not the only skill that The results also provide evidence to rate the psychomet-
develops during the phase of academic training. As the test ric properties of the final test instrument. The range of per-
items refer to problems and phenomena of the natural sci- son abilities was covered well by items with matching
ences, the group differences could indicate an increase of parameter estimates, indicating that the test was neither
a more general skill from this area, such as content knowl- too easy nor too difficult for the tested sample. All item
edge. On the other hand, we believe that this risk was min- infit MNSQs were close to the expected value of 1.00
imized by the extensive involvement of subject-matter (Bond & Fox, 2007). The item with the highest overfit
experts in the item construction process. Another alterna- had an infit MNSQ of 0.93, the infit of the item with the
tive explanation lies within the tested sample: Graduates highest underfit was 1.09. Overall, these results indicate a
can be seen as a selection of students who have been good model-data fit, with only minor redundancies in the
high-achieving during the phase of undergraduate educa- responses. In terms of item fit, we conclude that all 123
tion, therefore biasing the results of a cross-sectional items are suitable for measurement.
comparison of both groups. To test for competence devel- The EAP/PV reliability of the test was 0.544. Even
opment, longitudinal designs are needed. In the next stage though this value is lower than for most psychological tests,
of our project, we will conduct a longitudinal study to test it matches well with the reliabilities of standardized tests for
preservice science teachers’ competencies four times scientific reasoning competencies (Mannel, 2011: 0.23–
during the phase of academic training. 0.66; Neumann, 2011: 0.55; Terzer, 2013: 0.46; Wellnitz,
The second hypothesis stated that students who study 2012: 0.59). Yet, the EAP/PV reliability ‘‘seems to be of
two natural sciences perform better than students who only limited use as an index of measurement instrument quality’’
study one natural science. Results of the regression analysis (Adams, 2005, p. 170), and low reliabilities ‘‘can be com-
showed a positive regression coefficient for the variable dis- pensated for by larger samples’’ (Adams, 2006, p. 25).
cipline, supporting the hypothesis. This finding is in accor- In studies that focus on the estimation of population param-
dance with prior studies which found that a broader variety eters instead of the abilities of individual students, it is
of learning opportunities lead to an increase in scientific therefore ‘‘not uncommon . . . to implement tests that have
reasoning skills (Glaser et al., 1992; Kunz, 2012; Lawson low reliability’’ (Adams, 2005, p. 170).
et al., 2000). Again, the group differences were in the
expected direction, therefore supporting our interpretation
of the test scores as measures of scientific reasoning. Conclusion
The third hypothesis accounted for the learning opportu-
nities in academic programs for preservice science teachers. The aim of this study was to construct a standardized test
In seminars on teaching and instruction, preservice teachers addressed to examine preservice science teachers’ compe-
learn about science (Hodson, 2014) by discussing and tencies in the field of scientific reasoning, and to initially
reflecting on the inquiry process and the nature of science. evaluate its psychometric properties. We conducted a
That way, competencies in the field of scientific reasoning cross-sectional validation study (N = 2,247) in which we
are explicitly trained. If our test measures such competen- tested three hypotheses that predict differences between
cies, students who have had these specific learning oppor- known groups. Three lines of evidence supported the valid-
tunities should perform better in the test than students ity of our interpretation of the test results as measures of
who haven’t had such opportunities (Duschl & Grandy, scientific reasoning competencies: First, graduate students
2013; Giere et al., 2006). The results of the initial regres- performed better than undergraduate students, reflecting
sion analysis did not support this hypothesis: Overall, a neg- the positive effect of the increasing number of learning
ative effect was found, indicating that preservice science opportunities during the time students spent at the univer-
teachers perform slightly lower than natural sciences stu- sity. Second, students who study two natural sciences
dents. Broken down by degree (undergraduate vs. graduate), performed better than students who study one natural
science, reflecting the positive effect of learning opportuni- Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model:
ties that students receive as they proceed across different Fundamental measurement in the human sciences. London,
content domains within the natural sciences. The third UK: Erlbaum.
Brody, T. A., de la Peña, L., & Hodgson, P. E. (1993).
hypothesis predicted that preservice science teachers per- The philosophy behind physics. Berlin, Germany: Springer.
form better than a control group of natural sciences stu- Burns, J. C., Okey, J. R., & Wise, K. C. (1985). Development of
http://econtent.hogrefe.com/doi/pdf/10.1027/2151-2604/a000199 - Monday, October 19, 2015 4:37:14 AM - Humboldt-Universitaet zu Berlin IP Address:141.20.212.210
dents. The hypothesized effect was only found for an integrated process skill test: TIPS II. Journal of Research
graduate students, but not for undergraduates. It reflects in Science Teaching, 22, 169–177.
the positive impact of lectures and seminars in which they Bybee, R. W. (2002). Teaching science as inquiry. In J. Minstrell
explicitly discuss and reflect on the inquiry process. As the & E. H. van Zee (Eds.), Inquiring into inquiry learning and
teaching in science (pp. 20–46). Washington, DC: American
vast majority of these seminars are held during the graduate Association for the Advancement of Science.
phase of academic training, the result supports our hypoth- Crawford, B., & Cullin, M. (2005). Dynamic assessments of
esis. In addition to the empirical findings, the extensive preservice teachers’ knowledge of models and modelling.
involvement of subject-matter experts in the item construc- In K. Boersma, M. Goedhart, O. de Jong, &
tion process, and the use of student’s conceptions to con- H. Eijkelhof (Eds.), Research and the quality of science
struct multiple-choice answering options further improve education (pp. 309–323). Dordrecht, The Netherlands:
the validity by increasing both the relevance and the repre- Springer.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in
sentativeness of the items. psychological tests. Psychological Bulletin, 52, 281–302.
The analyses presented in this paper examined validity Der Präsident der Humboldt-Universität zu Berlin. (Ed.).
rather than reliability. Even though evidence for validity (2007a). Fachübergreifende Studienordnung für das Master-
provides indirect evidence for reliability as well, the low studium für das Lehramt [Interdisciplinary study regulations
EAP/PV reliability found in this study is a subject to ongo- for the Master of Education program]. Berlin, Germany:
ing discussion, and improvements to the booklet design are Humboldt University. Retrieved from http://www.amb.
hu-berlin.de/2007/99/9920070
currently under way. In the upcoming longitudinal phase of Der Präsident der Humboldt-Universität zu Berlin (Ed.).
our project, we will also implement additional tools to eval- (2007b). Studien- und Prüfungsordnung für das Bachelor-
uate different aspects of validity, for example, a thinking- studium Biologie: Kernfach Biologie und Beifach Chemie im
aloud validation study and a multi-trait-multi-method Monostudiengang. Beifach Biologie im Monostudiengang
(MTMM) study, including general cognitive ability, knowl- [Study regulations for the Bachelor program in biology:
edge about the Nature of Science, and an alternative scien- Major in biology and minor in chemistry. Minor in biology].
Berlin, Germany: Humboldt University. Retrieved from
tific reasoning test as covariates. http://www.amb.hu-berlin.de/2007/62/6220070
Overall, the findings led us to the conclusion that the test Der Präsident der Humboldt-Universität zu Berlin (Ed.).
can be used for the upcoming stage of our project, where (2007c). Studien- und Prüfungsordnung für das Bachelor-
scientific reasoning competencies will be tested in a longitu- studium Biologie: Kernfach und Zweitfach im Kombina-
dinal design. This longitudinal study will provide insights tionsstudiengang mit Lehramtsoption [Study regulations for
into the competence development from the beginning to the Bachelor program in biology: Major and minor in a dual
the end of academic training, and help us evaluate and program for preservice teachers]. Berlin, Germany:
Humboldt University. Retrieved from http://www.amb.
improve the contents and structure of science teacher hu-berlin.de/2007/68/6820070
education. Duschl, R. A., & Grandy, R. (2013). Two views about explicitly
teaching the nature of science. Science and Education, 22,
2109–2139.
Giere, R. N., Bickle, J., & Mauldin, R. F. (2006). Understanding
scientific reasoning. Independence, KY: Wadsworth/
References Cengage Learning.
Glaser, R., Schauble, L., Raghavan, K., & Zeitz, C. (1992).
Adams, R. J. (2005). Reliability as a measurement design effect. Scientific reasoning across different domains. In E. Corte,
Studies in Educational Evaluation, 31, 162–172. M. Linn, H. Mandl, & L. Verschaffel (Eds.), Computer-based
Adams, R. J. (2006, April). Reliability and item response learning environments and problem solving (pp. 345–371).
modelling. Myths, observations and applications. Paper Heidelberg, Germany: Springer.
presented at the 13th International Objective Measurement Godfrey-Smith, P. (2003). Theory and reality: An introduction to
Workshop, Berkeley, CA, USA. the philosophy of science. Chicago, IL: University of
Adams, R. J., Wilson, M. R., & Wu, M. L. (1997). Multilevel Chicago Press.
item response models: An approach to errors in variables Gonzalez, E., & Rutkowski, L. (2010). Practical approaches for
regression. Journal of Educational and Behavioral Statistics, choosing multiple-matrix sample designs. IEA-ETS
22, 46–75. Research Institute Monograph, 3, 125–156.
Adams, R. J., Wu, M. L., & Wilson, M. R. (2012). Conquest 3.0 Gott, R., & Duggan, S. (1998). Investigative work in the science
[computer software]. Camberwell, Australia: Australian curriculum. Buckingham, UK: Open University Press.
Council for Educational Research. Grosslight, L., Unger, C., Jay, E., & Smith, C. (1991).
American Association for the Advancement of Science. (1993). Understanding models and their use in science: Conceptions
Benchmarks for science literacy: Project 2061. New York, of middle and high school students and experts. Journal of
NY: Oxford University Press. Research in Science Teaching, 28, 799–822.
Blömeke, S., Gustafsson, J.-E., & Shavelson, R. (2015). Beyond Hattie, J. A., & Cooksey, R. W. (1984). Procedures for assessing
dichotomies: Competence viewed as a continuum. Zeitschrift the validity of tests using the ‘‘known groups’’ method.
für Psychologie, 223. doi: 10.1027/2151-2604/a000194 Applied Psychological Measurement, 8, 295–305.
Hodson, D. (2014). Learning science, learning about science, Rupp, A. A., & Pant, H. A. (2006). Validity theory. In N. J.
doing science: Different goals demand different learning Salkind (Ed.), Encyclopedia of measurement and statistics
methods. International Journal of Science Education, 36, (pp. 1032–1035). Thousand Oaks, CA: Sage.
2534–2553. doi: 10.1080/09500693.2014.899722 Sadler, P. M. (1998). Psychometric models of student concep-
Justi, R. S., & Gilbert, J. K. (2003). Teachers’ views on the tions in science: Reconciling qualitative studies and distrac-
nature of models. International Journal of Science Educa- tor-driven assessment instruments. Journal of Research in
http://econtent.hogrefe.com/doi/pdf/10.1027/2151-2604/a000199 - Monday, October 19, 2015 4:37:14 AM - Humboldt-Universitaet zu Berlin IP Address:141.20.212.210