You are on page 1of 6

Closing the Loop: Involving

Faculty in the Assessment of

Scientific and Quantitative
Reasoning Skills of Biology Majors
By Carol A. Hurney, Justin Brown, Heather Peckham Griscom, Erika Kancler, Clifton J. Wigtil, and Donna Sundre

The development of scientific and here are three options avail- institution of approximately 18,000
quantitative reasoning skills in able to faculty interested in students in Harrisonburg, Virginia and
undergraduates majoring in science, assessing the impact of un- has a strong emphasis on program as-
technology, engineering, and dergraduate education on sessment. The nationally recognized
mathematics (STEM) is an objective scientific and quantitative reasoning Center for Assessment and Research
of many courses and curricula. skills: use an existing instrument, Studies (CARS) provides significant
The Biology Department at James modify an existing instrument, or resources to the development of a
Madison University (JMU) assesses develop a new instrument. Given nationally recognized assessment
these essential skills in graduating the importance that science, tech- program (
biology majors by using a multiple- nology, engineering, and mathemat- Building on the need for assessment
choice exam called the Natural ics (STEM) programs and national of scientific and quantitative reason-
World-9 (NW-9). NW-9, comprised science organizations place on the ing in higher education, and more spe-
of measures of Quantitative and development of scientific and quan- cifically to inform STEM education,
Scientific Reasoning, contains titative reasoning skills, one would members of CARS in partnership with
items developed by faculty at expect to find an endless array of JMU faculty developed the Natural
JMU to assess the impact of the reliable instruments that assess World-9 (NW-9) instrument, which
General Education program on whether students graduating from contains two components: the Scien-
the development of scientific and undergraduate programs successfully tific Reasoning Test (SR-9; Sundre,
quantitative reasoning skills in acquired these essential skills (How- 2008) and the Quantitative Reason-
a content-independent manner. ard Hughes Medical Institute 1996; ing Test (QR-9; Sundre, Thelk, and
We discuss methodology we used NRC 2003). Many of the standard- Wigtil 2008). All NW-9 items were
to involve faculty in determining ized tests, such as the Graduate Re- written by James Madison University
the generalizability of NW-9 to cord Examination, include items that science and mathematics faculty to
assess the objectives of the biology assess scientific reasoning ability, but assess the objectives of the science
curriculum and setting standards for the most part research-based stan- component of the General Education
to interpret student achievement dardized tests address content knowl- program (see Table 1). Rather than
on NW-9. Student performance on edge (Bao et al. 2009). The Class- investing faculty time in developing
NW-9 identified both strong and room Test of Scientific Reasoning a new instrument, we decided to ex-
weak areas in our instruction and developed by Lawson in 1978 is still plore whether the NW-9 instrument
suggested that our biology faculty popular among STEM educators, but developed and tested by CARS could
needs to reevaluate methodology for this instrument addresses very broad assess scientific and quantitative
teaching students how to interpret areas of scientific reasoning and does reasoning skills in biology majors.
and analyze data. More important, not assess quantitative reasoning We also wanted to involve faculty
we can close the assessment loop skills (Lawson 1978). Unfortunately, in this process to enhance faculty
by allowing faculty to participate few readily accessible instruments understanding and appreciation of the
in the assessment process and are available that reliably assess both assessment process and results.
meaningfully reflect on student scientific and quantitative reasoning The Department of Biology has
assessment results. skills in undergraduates. 56 full-time and part-time faculty,
James Madison University (JMU) approximately 900 declared majors,
is a publicly funded, comprehensive and 100125 students who graduate

Journal of College Science Teaching
each year. The biology curriculum and skill objective 1 and General factual information (see Figure 1b).
is designed upon an explicit set of Education objective 6 both explore Based on these features of NW-9, we
content, skill, and experience learn- students ability to distinguish between determined the generalizability of the
ing objectives developed by biology association and causation. Second, NW-9 instrument to assess the skill
faculty. These objectives support the CARS has extensively tested both objectives of the biology major. We
two major goals of the curriculum: components of NW-9 to establish two did this by involving biology faculty in
insuring that biology majors are literate important measures of a meaningful a content alignment process in which
in the scientific process and integrating assessment instrument: reliability and they mapped NW-9 items to the skill
research experiences into the learn- validity. The NW-9 instrument reliabil- objectives. We also involved faculty
ing environment for all our majors. ity and validity scores suggest that the in the standard setting protocol to de-
Specifically, the skill objectives con- instrument consistently measures the termine the standards for acceptable
centrate on scientific reasoning skills scientific and quantitative reasoning performance of our graduating biology
(see Table 1, skill objectives 110), but objectives of the General Education seniors on items that mapped to the
they also include objectives related to program (Sundre 2008; Sundre, Thelk, skill objectives. Results from these en-
effective communication skills (see and Wigtil 2008). Third, NW-9 items deavors allow us to (1) evaluate senior
Table 1, skill objectives 1114) and do not test specific content knowledge. biology major students performance
the ability to use quantitative reasoning Rather, many of the items provide on the mapped items; (2) determine
skills to analyze biological phenomena content necessary to determine the whether students fell below, met, or
(see Table 1, skill objectives 7 and 14). answer (see Figure 1a), whereas other exceeded faculty standards; and (3)
Assessment of the skill objectives is items test concepts that do not rely on discuss NW-9 assessment results at
based on the results of two instruments,
a modified version of the Academic TABLE 1
Skills Inventory (ASI; Kruger and Comparison of biology major skill objectives (N = 14) with General
Zechmeister 2001) and the NW-9. The Education Cluster 3 objectives (N = 7).
ASI differs from the NW-9 instrument
in that the ASI asks students to report Biology major skill objectives
11. Discriminate between association and causation, and identify the types of
their experience level with a variety evidence used to establish causation.
of academic skills, whereas the NW-9 12. Formulate a hypothesis and identify relevant variables necessary to test that
instrument directly measures skill hypothesis.
level. Results from the ASI indicate 13. Design and execute experiments to test hypotheses.
14. Obtain data.
that students self-report behavioral 15. Organize data.
gains in skills associated with written 16. Analyze and interpret data.
and oral communication, research 17. Evaluate a statement, hypothesis, or claim using numerical or other evidence.
methodology, and statistics (Seifert et 18. Locate sources of scientific information.
19. Evaluate the reliability of sources.
al. 2009). Although the ASI provides 10. Critically evaluate a paper from the primary scientific literature.
insights regarding how well graduates 11. Use effective professional communication in posters.
of the biology major achieve some of 12. Use effective professional communication in lab reports.
the skill objectives, the NW-9 exam 13. Use effective professional communication in oral reports.
14. Use mathematics to understand and analyze biological phenomena.
provides a more direct measurement
of scientific and quantitative reason- General Education Cluster 3 objectives
ing skills. 1. Describe the methods of inquiry that lead to mathematical truth and scientific
Although the NW-9 instrument was knowledge and be able to distinguish science from pseudoscience.
2. Use theories and models as unifying principles that help us understand natural
designed to assess the General Edu- phenomena and make predictions.
cation learning objectives, there are 3. Recognize the interdependence of applied research, basic research, and
many features of NW-9 that suggest technology, and how they affect society.
this instrument will provide meaning- 4. Illustrate the interdependence between developments in science and social and
ethical issues.
ful data to assess the skill objectives 5. Use graphical, symbolic, and numerical methods to analyze, organize, and
of the biology major. First, many of interpret natural phenomena.
the General Education objectives are 6. Discriminate between association and causation, and identify the types of
similar to the biology major skill ob- evidence used to establish causation.
7. Formulate hypotheses, identify relevant variables, and design experiments to test
jectives. For example, skill objectives hypotheses.
7, 9, and 10 and General Education 8. Evaluate the credibility, use, and misuse of scientific and mathematical
objective 8 both discuss the ability of information in scientific developments and public-policy issues.
students to evaluate scientific sources,

Vol. 40, No. 6, 2011 19

departmental retreats regarding peda- mine content alignment of the NW-9 ceeded to the next skill objective. No
gogical strategies utilized by biology instrument to the skill objectives additional discussions or attempts to
faculty to address the skill objectives. (Martone and Sireci 2009). This was form consensus were attempted. This
accomplished by recruiting eight fac- objective by objective procedure is
Methods ulty members, representing various less arduous for faculty than attempt-
Content alignment of NW-9 subdisciplines in biology, to analyze ing to simultaneously make judg-
items to skill objectives the 66 items on the NW-9 instru- ments about individual items across
A critical step in determining the ment. Each faculty member provided all learning objectives (DAgostino
generalizability of the NW-9 instru- independent judgments on whether et al. 2008). In consultation with
ment is to examine the content align- an item successfully assessed one or the CARS, we developed a fairly
ment between test items and skill more of the skill objectives. Faculty stringent rule that an item would be
objectives (DAgostino et al. 2008). members were asked to review one deemed successfully mapped to a
The degree of content alignment stated learning objective at a time skill objective if six out of the eight
determines the ability of individual and determine whether or not each evaluators (75%) assigned the item to
items to provide accurate informa- NW-9 item successfully assessed that a particular objective.
tion on student performance for each objective. A dichotomous choice was
objective. Based on advice from the provided for each item (yes or no). Establishing faculty standards
assessment experts at the CARS, we After making judgments about one We used a modified Angoff method
utilized item-level analysis to deter- objective, the faculty member pro- to establish a faculty standard for

Two examples of NW-9 items: (a) question that requires students to demonstrate proficiency in more than one
skill, and (b) question that assesses the ability of students to interpret data.

(a) Regarding the two graphical displays given below, which of the following statements is correct?

a. Banebrook has the largest changes in temperature throughout the year.

b. Banebrook and Grove City temperatures exhibit exponential behavior throughout the year.
c. Neither of the above.

(b) Suppose a researcher wants to test the hypothesis that exposure to cadmium in childhood causes neurological damage
that reduces IQ. The researcher randomly selects 500 fourth graders, monitors their cadmium exposure for one year, and
then tests each students IQ. The researcher finds that as cadmium exposure increases, IQ declines. Can the researcher con-
clude from the observed association between cadmium exposure and intelligence that cadmium causes reduced IQ?

a. No. The researcher did not include enough persons in the study.
b. No. There may be a third variable associated with exposure to cadmium that actually causes the lowered IQ.
c. Yes. The researcher followed the scientific method.
d. Yes. An association between the amount of cadmium exposure and lowered IQ is exactly what we would predict from the

Journal of College Science Teaching
Closing the Loop

each skill objective to provide greater Determining student the faculty standard, then students
interpretive power regarding student performance on NW-9 did not meet the faculty standards.
results (Maurer et al. 1991). The An- We administered the NW-9 instru-
goff method provides a quantitative ment to 214 graduating seniors (88
benchmark to determine whether in 2008 and 126 in 2009). The mean
Content alignment of NW-9
graduating seniors are meeting fac- student scores on the suite of ques-
items to the skill objectives
ulty expectations. Biology faculty tions corresponding to each of the The stringent content alignment ac-
members (n = 15) who had no knowl- seven skill objectives were calcu- tivity we utilized revealed that 25 of
edge of student test performance ex- lated and transformed to the percent- the 66 items strongly mapped to 7 of
amined each of the NW-9 items that age correct. For each objective, the the 14 skill objectives. The objectives
mapped to the skill objectives. The faculty standards were compared for which items were successfully
faculty volunteers were asked to pro- with the performance of the gradu- aligned relate to distinguishing as-
vide a judgment of the percentage ating seniors using a Mann-Whitney sociation from causation, formulating
of graduating biology majors who U nonparametric test with sequen- and evaluating hypotheses, designing
should provide a correct response for tial Bonferroni post hoc analysis experiments, analyzing and interpret-
each item. During this exercise, facul- (see Table 2). Cohens d was used ing data, and using mathematics to
ty members were asked to not discuss to determine effect size. If the mean understand biological phenomena (see
their ratings until after completion of student score for an objective was Table 2). We found that multiple items
the entire exercise. Following Angoff significantly higher than the faculty were assigned to each of these seven
methodologies, faculty ratings for standard, students exceeded the fac- objectives. However, using the estab-
each item were grouped, on the basis ulty standard for that objective. If lished criteria, there were no items that
of the mapping data, to the appropri- the mean student scores were not mapped to skill objectives relating to
ate skill objectives. The mean of the significantly different from the fac- obtaining data; organizing data; locat-
scores for each skill objective repre- ulty standard, then students met the ing sources of scientific information;
sents the faculty standard for student faculty standard. If the mean student evaluating the reliability of sources;
success (see Table 2). score was significantly lower than critically evaluating a paper from the


Number of NW-9 items mapped, faculty standard, and student performance for six skill objectives.

NW-9 items mapping Faculty Student

Skill objective to objective standard performance
Student performance exceeded faculty standard
3. Design and execute experiments to test hypotheses. 3 items 84.8% 91.6%
(5% of test) (p < .0001, d = .27)
14. Use mathematics to understand and analyze biological 2 items 74.8% 87.1%
phenomena. (3% of test) (p < .0001, d = .36)
Student performance met faculty standard
1. Discriminate between association and causation, and identify 6 items 79.3% 75.5%
the types of evidence used to establish causation. (9% of test) (p = .5920, d = .53)

2. Formulate a hypothesis and identify relevant variables 11 items 82.6% 86.5%

necessary to test that hypothesis. (17% of test) (p = .050, d = .21)

7. Evaluate a statement, hypothesis, or claim using numerical or 15 items 78.3% 75.7%

other evidence. (23% of test) (p = .9740, d = .17)
Student performance fell below faculty standard
6. Analyze and interpret data. 23 items 81.0% 70.4%
(33% of test) (p < .009, d = .58)
Note: The faculty standard was derived from biology faculty predicting the percentage of graduating biology majors whom they
thought would provide a correct response for each item (refer to Faculty standards in the Results section). Student performance
was the percentage of correct answers that mapped to each skill objective. For each objective, the faculty standards were com-
pared with the performance of the graduating seniors using a Mann-Whitney U nonparametric test with sequential Bonferroni
post hoc analysis. Cohens d was used to determine the effect size (d). NW-9 = Natural World-9.

Vol. 40, No. 6, 2011 21

primary literature; and using effective Student performance entific research), and we will seek
professional communication in post- Graduating biology majors exceeded new direct methodologies.
ers, lab reports, and oral reports (skill faculty expectations for two skill ob- Faculty had an overall prediction
objectives 4, 5, 813). This was an jectives, met faculty expectations for that on average, 79% of the seniors
expected and validated finding. These three skill objectives, and fell below would answer correctly the suite of
learning objectives are not amenable faculty standards for one skill objec- NW-9 questions that mapped to the
to selected response item types. We tive (see Table 2). Seniors exceeded skill objectives. Faculty expectations
currently use the ASI and are exploring the faculty standard for designing across the objectives showed some
other more direct methods to assess experiments and using mathemat- variability, ranging from approxi-
these skills and competencies. Some ics to understand a biological phe- mately 75%85% correct. Some items
of the other NW-9 items that did not nomena (skill objectives 3 and 14). and objectives were determined to be
map to the skill objectives are designed In particular, the average score for more challenging than others. Actual
to assess General Education objec- items that map to designing and ex- student performances ranged from
tives that do not align with faculty- ecuting experiments (skill objective approximately 70%92% correct
developed curricular objectives of the 3) was 91.6%, which is much higher for a suite of questions mapped to a
biology major, such as understanding than the faculty standard of 84.8% (p particular objective. Faculty expecta-
the difference between basic and ap- = .0001, d = .36). Graduating seniors tions were highest (84.8% expected
plied research. met the faculty standard for formu- to answer correctly) for questions
The most highly assessed objec- lating hypotheses, discriminating that mapped to the skill objective
tive was skill objective 6, analyzing between association and causation, related to designing and executing
and interpreting data, as 33% of the and evaluating a statement or claim experiments to test hypotheses. Stu-
NW-9 items mapped to this objec- using evidence (skill objectives 1, 2, dent performance was also highest for
tive (see Table 2). An example of a and 7). Finally, our assessment re- questions that mapped to this objec-
NW-9 item that assesses the ability sults indicate that seniors correctly tive (91.6% of the students answered
of students to interpret data is shown answered 70.4% of the questions these questions correctly).
in Figure 1a. Content in this item is that map to skill objective 6, which Faculty standards were relatively
not directly addressed in any biology is significantly lower than the faculty high (>82.6% of students were pre-
course, which allows us to determine standard for analyzing and interpret- dicted to answer the question cor-
whether the student can transfer and ing data (81.0%, p < .009, d = .58). rectly) for NW-9 questions referring
generalize knowledge to interpret to the skill objective of formulating
data in a situation in which they are Discussion hypotheses and designing and execut-
not familiar with the content. Many As a result of this project, we have ing experiments to test hypotheses
items mapped to more than one skill empirical evidence that the NW-9 (see Table 2). This may be because
objective, which reflects that many provides meaningful measures of experimental design is emphasized
of the NW-9 items require students quantitative and scientific reasoning in biology courses, thus faculty
to demonstrate proficiency in more skills in biology majors. We found members have higher expectations for
than one skill to achieve the correct that 25 items on the NW-9 instrument these skills. The faculty standard was
answer. Overall, the content align- map to seven of the skill objectives lowest for the objective related to us-
ment activity provided validation of the biology curriculum. The cur- ing mathematics to answer biological
for the use of NW-9 test scores to riculum objectives assessed by NW-9 phenomena (<75% of students were
assess many of the quantitative and represent essential scientific and predicted to answer the questions
scientific reasoning objectives of our quantitative reasoning skills. Most correctly; see Table 2). This may be
curriculum. notably, the exam scores provide in- related to the difficulty of items that
sight into students abilities to iden- assess quantitative skills, but it could
Faculty standards tify and evaluate evidence that can be also reflect that faculty members do
For the most part, the faculty stan- used to establish causation, formulate not feel confident that courses in the
dards for each skill objective were hypotheses, identify relevant vari- biology curriculum address these
in the 75%85% range(see Table 2). ables to test hypotheses, analyze and skills. Likewise, faculty standards
The highest faculty standard, 84.8%, interpret data, and use mathematics for students ability to analyze and
was for designing and executing ex- to understand biological phenomena. interpret data were on the low end of
periments to test hypotheses, where- We recognize that the skill objectives the spectrum (81%). This is surprising
as the lowest, 74.8%, was for using not assessed by NW-9 are difficult to given that many laboratory courses
mathematics to understand biologi- evaluate with a multiple-choice exam emphasize the use of statistics to
cal phenomena. (e.g., effectiveness in presenting sci- analyze data.

Journal of College Science Teaching
Closing the Loop

We found that the NW-9 exam can we created a customized process of formal reasoning. Journal of
be used to assess many of the JMU that we can use, as a department, to Research in Science Teaching 15 (1):
Biology Department skill objectives, analyze student performance in the 1114.
which are most likely similar to the areas of scientific and quantitative Martone, A., and S.G. Sireci. 2009.
objectives other Biology Departments reasoning. More important, we have Evaluating alignment between cur-
have for their students. Our results created a culture of assessment in our riculum, assessment, and instruction.
demonstrate that the NW-9 exam can department that reflects the goals of Review of Educational Research 79
be used to assess scientific and quanti- the curriculum, the perspective of the (4): 13321361.
tative reasoning skills in areas outside faculty, and an awareness of student Maurer, T.J., R.A. Alexander, C.M. Cal-
of the General Education curriculum. learning outcomes. This process has lahan, J.J. Bailey, and F.H. Dambrot.
Institutions interested in implementing helped us to close the loop with 1991. Methodological and psycho-
instruments, such as NW-9, should understanding and using our assess- metric issues in setting cutoff scores
map the items to their curriculum ment results. Our faculty conversa- using the Angoff method. Personal
objectives and set faculty standards, tions about assessment, our program, Psychology 44 (2): 235262.
as these will vary with student popula- and our students learning have been National Research Council (NRC).
tions, curriculum, and faculty expecta- deepened and enriched. Most impor- 2003. Biology 2010: Transforming
tions. Once student performance data tant, these results provide our faculty undergraduate education for future
is collected, faculty can identify areas with compelling evidence that the research biologists. Washington, DC:
of strength and weakness in instruc- NW-9 instrument measures many of National Academies Press.
tion and/or curriculum. the biology-major student learning Seifert, K., C.A. Hurney, C.J. Wigtil,
Overall, results from the NW-9 objectives. We were able to engage and D.L. Sundre. 2009. Using the
instrument in conjunction with the many of our faculty in the develop- academic skills inventory (ASI) to
results from the ASI (Seifert 2009) ment of a community-established assess the biology major. Assessment
suggest that the current biology expectation for student performance. Update 21 (12): 1415.
major curriculum produces students Finally, this set of student performance Sundre, D.L. 2008. The Scientific
who have met or exceeded faculty expectations gave us a new and valued Reasoning Test, Version 9 (SR-9)
expectations for most of the specified interpretive framework for our assess- test manual. Harrisonburg, VA:
curriculum skill objectives. We also ment results. n Center for Assessment and Research
noted a weak area in the curriculum Studies. www.madisonassessment.
regarding the skill of analyzing and References com/assessment-testing/scientific-
interpreting data. This suggests a need Bao, L., T. Cai, K. Koenig, K. Fang, J. reasoning-test/
for conversations to occur between Han, J. Wang, Q. Liu, et al. 2009. Sundre, D.L., A. Thelk, and C. Wigtil.
laboratory instructors in regards to Learning and scientific reasoning. 2008. The Quantitative Reasoning
this essential skill objective. Labo- Science 323 (5914): 586587. Test, Version 9 (QR-9) test manual.
ratory courses should be targeted, DAgostino, J.V., M.E. Welsh, A.D. Harrisonburg, VA: Center for Assess-
because this is where the majority of Cimetta, L.D. Falco, S. Smith, W.H. ment and Research Studies. www.
inquiry-based learning occurs, such as VanWinckle, and S.J. Powers. 2008.
analyzing and interpreting data. This The rating and matching item- testing/quantitative-reasoning-test/
study provides a baseline measure objective alignment methods. Ap-
for the impact of the curriculum on plied Measurement in Education 21 Carol A. Hurney is an associate profes-
skill development. We will continue (1): 121. sor of biology and executive director of
to monitor our assessment results to Howard Hughes Medical Institute. the Center for Faculty Innovation, Jus-
measure the impact of changes we 1996. Beyond Bio 101: The trans- tin Brown is an assistant professor of
implement in laboratory courses to formation of undergraduate biology biology, Heather Peckham Griscom
see if these changes increase student education. Chevy Chase, MD: How- ( is an associate
skill in data analysis. ard Hughes Medical Institute. www. professor of biology, and Erika Kancler
One of the most significant out- is an assistant professor of biology, all at
comes we observed as we imple- Kruger, D.J., and E.B. Zechmeister. James Madison University (JMU) in Har-
mented our assessment design was 2001. A skills-experience inventory risonburg, Virginia. Clifton J. Wigtil is
an increase in faculty participation for the undergraduate psychology graduate student in gifted education at
and interest in the assessment process major. Teaching of Psychology 28 Purdue University. Donna Sundre is a
and student results. By involving (4): 249253. professor of psychology and the execu-
biology faculty in the content align- Lawson, A.E. 1978. The development tive director of the Center for Assessment
ment and standard setting activities, and validation of a classroom test and Research Studies at JMU.

Vol. 40, No. 6, 2011 23