You are on page 1of 17

Chemistry Education

Research and Practice


View Article Online
PAPER View Journal

Development and validation of an instrument to


measure undergraduate chemistry students’
Cite this: DOI: 10.1039/c8rp00130h
critical thinking skills†
Stephen M. Danczak, * Christopher D. Thompson and Tina L. Overton

The importance of developing and assessing student critical thinking at university can be seen through its
Published on 12 July 2019. Downloaded on 7/17/2019 7:22:08 AM.

inclusion as a graduate attribute for universities and from research highlighting the value employers,
educators and students place on demonstrating critical thinking skills. Critical thinking skills are seldom
explicitly assessed at universities. Commercial critical thinking assessments, which are often generic in
context, are available. However, literature suggests that assessments that use a context relevant to
the students more accurately reflect their critical thinking skills. This paper describes the development
and evaluation of a chemistry critical thinking test (the Danczak–Overton–Thompson Chemistry Critical
Thinking Test or DOT test), set in a chemistry context, and designed to be administered to undergraduate
chemistry students at any level of study. Development and evaluation occurred over three versions of the
DOT test through a variety of quantitative and qualitative reliability and validity testing phases. The studies
suggest that the final version of the DOT test has good internal reliability, strong test–retest reliability,
moderate convergent validity relative to a commercially available test and is independent of previous
Received 18th May 2018, academic achievement and university of study. Criterion validity testing revealed that third year students
Accepted 28th June 2019 performed statistically significantly better on the DOT test relative to first year students, and postgraduates
DOI: 10.1039/c8rp00130h and academics performed statistically significantly better than third year students. The statistical and qualitative
analysis indicates that the DOT test is a suitable instrument for the chemistry education community to
rsc.li/cerp use to measure the development of undergraduate chemistry students’ critical thinking skills.

Introduction the job market for effective new graduates and the expectations that
graduates are able to demonstrate skills such as critical thinking
The term ‘critical thinking’ or expressions referring to critical (Lowden et al., 2011).
thinking skills and behaviours such as ‘analyse and interpret data A survey of 167 recent science graduates compared the devel-
meaningfully’ can be found listed in the graduate attributes of many opment of a variety of skills at university to the skills used in the
universities around the world (Monash University, 2015; University work place (Sarkar et al., 2016). It found that 30% of graduates in
of Adelaide, 2015; University of Melbourne, 2015; Ontario University, full-time positions identified critical thinking as one of the top
2017; University of Edinburgh, 2017). Many studies highlight five skills they would like to have developed further within their
motivations for the development of higher education graduates’ undergraduate studies. Students, governments and employers all
critical thinking skills from the perspective of students and employ- recognise that not only is developing students’ critical thinking an
ers. When 1065 Australian employers representing a range of intrinsic good, but that it better prepares them to meet and exceed
industries were surveyed it was found that employers considered employer expectations when making decisions, solving problems
critical thinking to be the second most important skill or attribute and reflecting on their own performance (Lindsay, 2015). Hence,
behind active learning, and that over 80% of respondents indicated it has become somewhat of an expectation from governments,
critical thinking as ‘important’ or ‘very important’ as a skill or employers and students that it is the responsibility of higher
attribute in the workplace (Prinsley and Baranyai, 2015). Inter- education providers to develop students’ critical thinking skills.
national studies have revealed similar results (Jackson, 2010; Desai Yet, despite the clear need to develop these skills, measuring
et al., 2016). These findings are indicative of the persistent needs of student attainment of critical thinking is challenging.

The definition of critical thinking


School of Chemistry, Monash University, Victoria 3800, Australia.
E-mail: stephen.danczak@monash.edu Three disciplines dominate the discussion around the defini-
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c8rp00130h tion of critical thinking: philosophy, cognitive psychology and

This journal is © The Royal Society of Chemistry 2019 Chem. Educ. Res. Pract.
View Article Online

Paper Chemistry Education Research and Practice

education research. Among philosophers, one of the most abstract environment provide no benefit to the student’s capa-
commonly cited definitions of critical thinking is drawn from city to think critically (McPeak, 1990). This perspective is
the Delphi Report which defines critical thinking as ‘purpose- supported by the work of prominent psychologists in the early
ful, self-regulatory judgement which results in interpretation, 20th century (Thorndike and Woodworth, 1901a, 1901b, 1901c;
analysis, evaluation, and inference, as well as explanation of the Inhelder and Piaget, 1958).
evidential, conceptual, methodological, criteriological, or con- In the latter half of the 20th century informal logic gained
textual considerations upon which that judgement is based’ academic credence as it challenged the previous ideas of logic
(Facione, 1990, p. 2). Despite being developed over 25 years being related purely to deduction or inference and that there
ago this report is still relevant and the definition provided is were, in fact, theories of argumentation and logical fallacies
still commonly used in recent literature (Abrami et al., 2015; (Johnson et al., 1996). These theories began to be taught at
Desai et al., 2016; Stephenson and Sadler-Mcknight, 2016). universities as standalone courses free from any context in
Cognitive psychologists and education researchers use the efforts to teach the structure of arguments and recognition of
term critical thinking to describe a set of cognitive skills, fallacies using abstract theories and symbolism. Cognitive
strategies or behaviours that increase the likelihood of a desired psychology research lent evidence to the argument that critical
outcome (Halpern, 1996b; Tiruneh et al., 2014). Psychologists thinking could be developed within a specific discipline and
Published on 12 July 2019. Downloaded on 7/17/2019 7:22:08 AM.

typically investigate critical thinking experimentally and have those reasoning skills were, at least to some degree, transfer-
developed a series of reasoning schemas with which to study able to situations encountered in daily life (Lehman et al., 1988;
and define critical thinking; conditional reasoning, statistical Lehman and Nisbett, 1990). These perspectives form the basis
reasoning, methodological reasoning and verbal reasoning (Nisbett of the subject generalist, who believed critical thinking can be
et al., 1987; Lehman and Nisbett, 1990). Halpern (1993) expanded developed independent of subject specific knowledge.
on these schemas to define critical thinking as the thinking McMillan (1987) carried out a review of 27 empirical studies
required to solve problems, formulate inferences, calculate like- conducted at higher education institutions where critical think-
lihoods and make decisions. ing was taught, either in standalone courses or integrated into
In education research there is often an emphasis on critical discipline-specific courses such as science. The review found
thinking as a skill set (Bailin, 2002) or putting critical thought that standalone and integrated courses were equally successful in
into tangible action (Barnett, 1997). Dressel and Mayhew (1954) developing critical thinking, provided critical thinking develop-
suggested it is educationally useful to define critical thinking as mental goals were made explicit to the students. The review also
the sum of specific behaviours which could be observed from suggested that the development of critical thinking was most
student acts. They identify these critical thinking abilities as effective when its principles were taught across a variety of
identifying central issues, recognising underlying assumptions, discipline areas so as to make knowledge retrieval easier.
evaluating evidence or authority, and drawing warranted con- Ennis (1989) suggested that there are a range of approaches
clusions. Bailin (2002) raises the point that from a pedagogical through which critical thinking can be taught: general, where
perspective many of the skills or dispositions commonly used critical thinking is taught separate from content or ‘discipline’;
to define critical thinking are difficult to observe and, therefore, infusion, where the subject matter is covered in great depth and
difficult to assess. Consequently, Bailin suggests that the con- teaching of critical thinking is explicit; immersion, where the
cept of critical thinking should explicitly focus on adherence subject matter is covered in great depth but critical thinking
to criteria and standards to reflect ‘good’ critical thinking goals are implicit; and mixed, a combination of the general
(Bailin, 2002, p. 368). approach with either the infusion or immersion approach.
It appears that there are several definitions of critical think- Ennis (1990) arrived at a pragmatic view to concede that the
ing of equally valuable meaning (Moore, 2013). There is agree- best critical thinking occurs within one’s area of expertise, or
ment across much of the field that meta-cognitive skills, such as domain specificity, but that critical thinking can still be effec-
self-evaluation, are essential to a well-rounded process of critical tively developed with or without discipline specific knowledge
thinking (Glaser, 1984; Kuhn, 1999; Pithers and Soden, 2000). (McMillan, 1987; Ennis, 1990).
There are key themes such as ‘critical thinking: as judgement, as Many scholars still remain entrenched in the debate regarding
scepticism, as originality, as sensitive reading, or as rationality’ the role discipline-specific knowledge has in the development
which can be identified across the literature. In the context of of critical thinking. For example, Moore (2011) rejected the use
developing an individual’s critical thinking it is important that of critical thinking as a catch-all term to describe a range of
these themes take the form of observable behaviours. cognitive skills, believing that to teach critical thinking as a set
of generalisable skills is insufficient to provide students with an
Developing students’ critical thinking adequate foundation for the breadth of problems they will
There are two extreme views regarding the teaching of critical encounter throughout their studies. Conversely, Davies (2013)
thinking and the role subject-specific knowledge plays in its accepts that critical thinking skills share fundamentals at the basis
development: the subject specifist view and the subject generalist of all disciplines and that there can be a need to accommodate the
view. The subject specifist view, championed by McPeak discipline-specific needs ‘higher up’ in tertiary education via the
(McPeak, 1981) states that thinking is never without context infusion approach. However, Davies considers the specifist approach
and thus courses designed to teach informal logic in an to developing critical thinking ‘dangerous and wrong-headed’

Chem. Educ. Res. Pract. This journal is © The Royal Society of Chemistry 2019
View Article Online

Chemistry Education Research and Practice Paper

(Davies, 2013, p. 543), citing government reports and primary Thinking Appraisal (WCGTA) (AssessmentDay Ltd, 2015), the
literature which demonstrates tertiary students’ inability to Watson-Glaser Critical Thinking Appraisal Short Form (WGCTA-S)
identify elements of arguments, and championing the need (Pearson, 2015), the Cornell Critical Thinking Test Level Z
for standalone critical thinking courses. (CCTT-Z) (The Critical Thinking Co., 2017), the Ennis-Weir
Pedagogical approaches to developing critical thinking in Critical Thinking Essay Test (EWCTET) (Ennis and Weir, 1985),
chemistry in higher education range from writing exercises and the Halpern Critical Thinking Assessment (HCTA) (Halpern,
(Oliver-Hoyo, 2003; Martineau and Boisvert, 2011; Stephenson 2016). All of the tests use a generic context and participants
and Sadler-Mcknight, 2016), inquiry-based projects (Gupta et al., require no discipline-specific knowledge in order to make a
2015), flipped lectures (Flynn, 2011) and open-ended practicals reasonable attempt on the tests. Each test is accompanied
(Klein and Carney, 2014) to gamification (Henderson, 2010), by a manual containing specific instructions, norms, validity,
and work integrated learning (WIL) (Edwards et al., 2015). reliability and item analysis. The CCTST, WGCTA and WGCTA-S
Researchers have demonstrated the benefits of developing are available as two versions (often denoted a version A or B) to
critical thinking skills across all first, second and third year facilitate pre- and post-testing. The tests are generally untimed
programs of an undergraduate degree (Phillips and Bond, 2004; with the exception of the CCTT-Z and the EWCTET.
Iwaoka et al., 2010). Phillips and Bond (2004) indicated that such Several reviews of empirical studies suggest that the WGCTA is
Published on 12 July 2019. Downloaded on 7/17/2019 7:22:08 AM.

interventions help develop a culture of inquiry, and better the most prominent test in use (Behar-Horenstein and Niu, 2011;
prepare students for employment. Carter et al., 2015; Huber and Kuncel, 2016). However, the CCTST
Some studies demonstrate the outcomes of teaching inter- was developed much later than the WGCTA and recent trends
ventions via validated commercially available critical thinking suggest the CCTST has gained popularity amongst researchers
tests, available from a variety of vendors for a fee (Abrami et al., since its inception. Typically, the tests are administered to address
2008; Tiruneh et al., 2014; Abrami et al., 2015; Carter et al., 2015). questions regarding the development of critical thinking over
There are arguments against the generalisability of these com- time or the effect of a teaching intervention. The results of this
mercially available tests. Many academics believe assessments testing are inconsistent; some studies report significant changes
need to closely align with the intervention(s) (Ennis, 1993) and a while others report no significant changes in critical thinking
more accurate representation of student ability is obtained when a (Behar-Horenstein and Niu, 2011). For example, Carter et al.
critical thinking assessment is related to a students’ discipline, as (2015) found studies which used the CCTST or the WGCTA
they attach greater significance to the assessment (Halpern, 1998). did not all support the hypothesis of improved critical thinking
with time, with some studies reporting increases, and some
Review of commercial critical thinking assessment tools studies reporting decreases or no change over time. These
A summary of the most commonly used commercial critical reviews highlight the importance of experimental design when
thinking skills tests, the style of questions used in each test and evaluating critical thinking. McMillan (1987) reviewed 27 stu-
the critical thinking skills these tests claim to assess can be found dies and found that only seven of them demonstrated signifi-
in Table 1. For the purposes of this research the discussion will cant changes in critical thinking. He concluded that tests which
focus primarily on the tests and teaching tools used within the were designed by the researcher are a better measure of critical
higher education setting. Whilst this list may not be exhaustive, it thinking, as they specifically address the desired critical think-
highlights those most commonly reported in the literature. The ing learning outcomes, as opposed to commercially available
tests described are the California Critical Thinking Skills Test tools which attempt to measure critical thinking as a broad and
(CCTST) (Insight Assessment, 2013), the Watson-Glaser Critical generalised construct.

Table 1 Summary of commonly used commercially available critical thinking test

Test Question structure Critical thinking skills assessed


California Critical Thinking Skills test 40 item multiple choice questions Analysis, evaluation, inference, deduction and
(CCTST) (Insight Assessment, 2013) induction
Watson-Glaser Critical Thinking Appraisal 80 item multiple choice questions Inference, deduction, drawing conclusions, making
(WCGTA) (AssessmentDay Ltd, 2015) assumptions and assessing arguments
Watson-Glaser Critical Thinking Appraisal 40 item multiple choice questions Inference, deduction, drawing conclusions, making
Short Form (WGCTA-S) (Pearson, 2015) assumptions and assessing arguments
Cornell Critical Thinking Test Level Z 52 item multiple choice questions Induction, deduction, credibility, identification of
(CCTT-Z) (The Critical Thinking Co., 2017) assumption, semantics, definition and prediction
in planning experiments
Ennis–Weir Critical Thinking Essay Test Eight paragraphs which are letters Understanding the point, seeing reasons and
(EWCTET) (Ennis and Weir, 1985) containing errors in critical thinking assumptions, stating one’s point, offering good
and essay in response to these reasons, seeing other possibilities and responding
paragraphs. appropriately and/or avoiding poor argument structure
Halpern Critical Thinking 20 scenarios or passages followed by a Reasoning, argument analysis, hypothesis testing,
Assessment (HCTA) (Halpern, 2016) combination of 25 multiple choice, likelihood and uncertainty analysis, decision making
ranking or rating alternatives and and problem solving
25 short answer responses

This journal is © The Royal Society of Chemistry 2019 Chem. Educ. Res. Pract.
View Article Online

Paper Chemistry Education Research and Practice

The need for a contextualised chemistry critical thinking test This study describes the development and reliability and validity
The desire to develop the critical thinking skills of students at testing of an instrument with which to measure undergraduate
higher education institutions has led to the design and imple- chemistry students’ critical thinking skills: The Danczak–Overton–
mentation of a breadth of teaching interventions, and the Thompson Chemistry Critical Thinking Test (DOT test).
development of a range of methods of assessing the impact
of these interventions. Many of these assessment methods
utilise validated, commercially available tests. However, there
Method
is evidence to suggest that if these assessments are to be used The development of the DOT test occurred in five stages (Fig. 1)
with tertiary chemistry students, the context of the assessments including; the development of an operational definition of
should be in the field of chemistry so that the students may better critical thinking, writing the pilot DOT test, and evaluation of
engage with the assessment in a familiar context (McMillan, three iterations of the test (DOT V1, DOT V2 and DOT V3). Data
1987; Ennis, 1993; Halpern, 1998), and consequently, students’ obtained in Stage 1 was compared with definitions and critical
performance on a critical thinking test may better reflect their thinking pedagogies described in the literature to facilitate the
actual critical thinking abilities. development of a functional definition of critical thinking
Several examples of chemistry specific critical thinking tests and which informed the development of the DOT test. During Stage 2,
Published on 12 July 2019. Downloaded on 7/17/2019 7:22:08 AM.

teaching tools were found in the literature. However, while all of data from Stage 1 was used to identify which elements of critical
these tests and teaching activities were set within a chemistry thinking the DOT test would measure, and identify commercially
context, they require discipline specific knowledge and/or were available test(s) suitable as guides for question development. Stage 3
not suitable for very large cohorts of students. For example Jacob consisted of internal reliability and content validity of the DOT v1.
(2004) presented students with six questions each consisting of a At Stage 4 a cross-sectional group of undergraduate chemistry
statement requiring an understanding of declarative chemical students undertook the DOT V2 and the WGTCA-S to determine
knowledge. Students were expected to select whether the conclu- test-rest reliability, convergent validity and content validity of the
sion was valid, possible or invalid and provide a short statement to DOT V2. Finally in Stage 5 internal reliability, item difficulty,
explain their reasoning. Similarly, Kogut (1993) developed exercises criterion validity and discriminate validity of the DOT V3 was
where students were required to note observations and underlying determined via a cross-sectional study. The test was administered
assumptions of chemical phenomena then develop hypotheses, to first year and third chemistry undergraduates at Monash
and experimental designs with which to test these hypotheses. University, third year chemistry undergraduates at Curtin Univer-
However, understanding the observations and underlying assump- sity, and a range of PhD students, post-doctoral researchers and
tions was dependent on declarative chemical knowledge such as academics drawn from Monash University and an online
trends in the periodic table or the ideal gas law. chemistry education research community of practice.
Garratt et al. (1999) developed an entire book dedicated to As critical thinking tests are considered to evaluate a
developing chemistry critical thinking, titled ‘A Question of psychometric construct (Nunnally and Bernstein, 1994) there
Chemistry’. In writing this book the authors took the view that must be supporting evidence of their reliability and validity
thinking critically in chemistry draws on the generic skills of (Kline, 2005; DeVellis, 2012). Changes made to each iteration
critical thinking and what they call ‘an ethos of a particular of the DOT test and the qualitative and quantitative analysis
scientific method’ (Garratt et al., 2000, p. 153). The approach to performed at each stage of the study are described below.
delivering these questions ranged from basic multiple choice
questions, to rearranging statements to generate a cohesive Ethics
argument, or open-ended responses. The statements preceding All participants in the study were informed that their participation
the questions are very discipline specific and the authors acknowl- was voluntary, anonymous, would in no way affect their academic or
edge they are inaccessible to a lay person. Overall the chemistry professional records, and that they were free to withdraw from the
context is used because ‘it adds to the students’ motivation if they study at any time. Participants were provided with an explanatory
can see the exercises are firmly rooted in, and therefore relevant statement outlining these terms and all procedures were approved
to, their chosen discipline’ (Garratt et al., 2000, p. 166). by Monash University Human Research Ethics Committee
Thus, an opportunity has been identified to develop a (MUHREC) regulations (project number CF16/568-2016000279).
chemistry critical thinking test which could be used to assist
chemistry educators and chemistry education researchers in Qualitative data analysis
evaluating the effectiveness of teaching interventions designed The major qualitative element of this study sought to examine
to develop the critical thinking skills of chemistry undergrad- how participants engaged with critical thinking tests. It was
uate students. This study aimed to determine whether a valid interested in understanding the effect of using scientific termi-
and reliable critical thinking test could be developed and nology in a critical thinking test, what information participants
contextualised within the discipline of chemistry, yet indepen- perceived as important, what they believed the questions were
dent of any discipline-specific knowledge, so as to accurately asking them, and to understand the reasoning underpinning
reflect the critical thinking ability of chemistry students from their responses to test questions. These qualitative investigations
any level of study, at any university. aimed to understand individuals’ perceptions of critical thinking

Chem. Educ. Res. Pract. This journal is © The Royal Society of Chemistry 2019
View Article Online

Chemistry Education Research and Practice Paper


Published on 12 July 2019. Downloaded on 7/17/2019 7:22:08 AM.

Fig. 1 Flow chart of methodology consisting of writing and evaluating the reliability and validity of iterations of the DOT test.

and synthesise them into generalizable truths to improve the deductive logic elements of critical thinking, such as ‘analysis’
critical thinking test. Thus, the core research informed by and ‘problem solving’, and neglected to describe inductive logic
Constructivism in terms of a framework that is applied within elements, such as ‘judgement’ or ‘inference’, typical of the
the context of chemistry education as presented in the review by literature on critical thinking (Facione, 1990; Halpern, 1996b).
Ferguson (2007). Research questions that are investigated using Therefore, students, teaching staff and employers did not
constructivism are based on the theory that ‘knowledge is define critical thinking in the holistic fashion of philosophers,
constructed in the mind of the learner’ (Ferguson, 2007, p. 28). cognitive psychologists or education researchers. In fact, very
This knowledge is refined and shaped by the learners’ surround- much in line with the constructivist paradigm (Ferguson, 2007),
ings and social interactions in what is referred to as social participants seem to have drawn on elements of critical think-
constructivism (Ferguson, 2007, p. 29). ing relative to the environments in which they had previously
The qualitative data for DOT V1 and DOT V2 were treated as been required to use critical thinking. For example, students
separate studies. The data for these studies were collected and focused on analysis and problem solving, possibly due to
analysed separately as described in the following. The data collected the assessment driven environment of university, whereas
from focus groups throughout this research were recorded with employers cited innovation and global contexts, likely to be
permission of the participants and transcribed verbatim into reflective of a commercial environment.
Microsoft Word, at which point participants were de-identified. The definitions of critical thinking in the literature
The transcripts were then imported into NVivo version 11 and an (Lehman et al., 1988; Facione, 1990; Halpern, 1996b) cover a
initial analysis was performed to identify emergent themes. The wide range of skills and behaviours. These definitions often
data then underwent a second analysis to ensure any underlying imply that to think critically necessitates that all of these skills
themes were identified. A third review of the data used a redun- or behaviours be demonstrated. However, it seems almost
dancy approach to combine similar themes. The final themes were impossible that all of these attributes could be observed at a
then used for subsequent coding of the transcripts (Bryman, 2008). given point in time, let alone assessed (Dressel and Mayhew,
1954; Bailin, 2002). Whilst the students in Danczak et al. (2017)
used ‘problem solving’ and ‘analysis’ to define critical thinking,
Developing an operational definition of it does not necessarily mean that their description accurately
critical thinking reflects the critical thinking skills they have actually acquired,
but rather their perception of what critical thinking skills they
Previous work (Danczak et al., 2017) described that chemistry have developed. Therefore, to base a chemistry critical thinking
students, teaching staff and employers predominately identified test solely on analysis and problem solving skills would lead to

This journal is © The Royal Society of Chemistry 2019 Chem. Educ. Res. Pract.
View Article Online

Paper Chemistry Education Research and Practice

the omission of the assessment of other important aspects of Below is an example of paired statements and questions
critical thinking. written for the DOT P. This question is revisited throughout this
To this end, the operation definition of critical thinking paper to illustrate the evolution of the test throughout the study.
acknowledges the analysis and problem solving focus that The DOT P used essentially the same instructions as provided on
students predominately used to describe critical thinking, whilst the WGCTA practice test. In later versions of the DOT Test the
expanding into other important aspects of critical thinking such instructions were changed as will be discussed later.
as inference and judgement. Consequently, guidance was sought The parent statement of the exemplar question from the
from existing critical thinking assessments, as described below. WGCTA required the participant to recognise proposition A and
proposition B are different, and explicitly states that there is
a relationship between proposition A and proposition C. This
Development of the Danczak–Overton– format was used to generate the following parent statement:
Thompson chemistry critical thinking A chemist tested a metal centred complex by placing it in a
test (DOT test) magnetic field. The complex was attracted to the magnetic field.
From this result the chemist decided the complex had unpaired
At the time of test development only a privately licenced version electrons and was therefore paramagnetic rather than diamagnetic.
Published on 12 July 2019. Downloaded on 7/17/2019 7:22:08 AM.

of the California Critical Thinking Skills Test (CCTST) (Insight In writing an assumption question for the DOT test, para-
Assessment, 2013) and a practice version of the Watson-Glaser magnetic and diamagnetic behaviour of metal complexes
Critical Thinking Appraisal (WGCTA) (AssessmentDay Ltd, 2015) replaced propositions A and B. The relationship between pro-
were able to be accessed. Several concessions had to be made positions A and C was replaced with paramagnetic behaviour
with respect to the practicality of using commercial products; being related to unpaired electrons. The question then asked if
cost, access to questions and solutions, and reliability of it is a valid or invalid assumption that proposition B is not
assessment. The WGCTA style questions were chosen as a related to proposition C.
model for the DOT Test since the practice version was freely Diamagnetic metals centred complexes do not have any
available online with accompanying solutions and rationale for unpaired electrons.
those solutions (AssessmentDay Ltd, 2015). The WGCTA is a The correct answer was a valid assumption as this question
multiple choice test, and while open-ended response questions required the participant to identify proposition B and C were not
more accurately reflect the nature of critical thinking and related. The explanation for the correct answer was as follows:
provide the most reliable results, as the number of participants The paragraph suggests that if the complex has unpaired
increases, the costs in terms of time and funds become electrons it is paramagnetic. This means diamagnetic complexes
challenging and multiple choice questions become more viable likely cannot have unpaired electrons.
(Ennis, 1993). All 85 questions on the WGCTA practice test were analysed
The WGCTA is an 85 item test which has undergone in the manner exemplified above to develop the DOT P. In
extensive scrutiny in the literature since its inception in the designing the test there were two requirements that had to be
1920s (Behar-Horenstein and Niu, 2011; Huber and Kuncel, met. Firstly, the test needed to be able to be completed
2016). The WGCTA covers the core principles of critical think- comfortably within 30 minutes to allow it to be administered
ing divided into five sections: inference, assumption identifi- in short time frames, such as at the end of laboratory sessions,
cation, deduction, interpreting information and evaluation of and to increase the likelihood of voluntary completion by
arguments. The questions test each aspect of critical thinking students. Secondly, the test needed to be able to accurately
independent of context. Each section consists of brief instruc- assess the critical thinking of chemistry students from any level
tions and three or four short parent statements. Each parent of study, from first year general chemistry students to final year
statement acts as a prompt for three to seven subsequent students. To this end, chemistry terminology was carefully
questions. The instructions, parent statements, and the questions chosen to ensure that prior knowledge of chemistry was not
themselves were concise with respect to language and reading necessary to comprehend the questions. Chemical phenomena
requirements. The fact that the WGCTA focused on testing were explained and contextualised completely within the parent
assumptions, deductions, inferences, analysing arguments and statement and the questions.
interpreting information was an inherit limitation in its ability to
assess all critical thinking skills and behaviours. However, these
elements are commonly described by many definitions of critical DOT pilot study: content validity
thinking (Facione, 1990; Halpern, 1996b).
The questions from the WGCTA practice test were analysed Members of the Monash Chemistry Education Research Group
and used to structure the DOT test. The pilot version of the DOT attempted the DOT P and their feedback was collected through
test (DOT P) was initially developed with 85 questions, set informal discussion with the researcher and was considered an
within a chemistry or science context, and using similar structure exploratory discussion of content validity. The group consisted
and instructions to the WGCTA with five sub-scales: making of two teaching and research academics, one post-doctoral
assumptions, analysing arguments, developing hypotheses, researcher and two PhD students. The group received the
testing hypotheses, and drawing conclusions. intended responses to the DOT P questions and identified

Chem. Educ. Res. Pract. This journal is © The Royal Society of Chemistry 2019
View Article Online

Chemistry Education Research and Practice Paper

which questions they felt did not illicit the intended critical Internal reliability of each iteration of the DOT test was
thinking behaviour, citing poor wording and instances where determined by calculating Cronbach’s a (Cronbach, 1951).
the chemistry was poorly conveyed. Participants also expressed Within this study the comparison between two continuous
frustrations between the selection of five potential options in variables was made using the non-parametric equivalent of a
the ‘Develop Hypotheses’ section, having trouble distinguish- Pearson’s r, Spearman’s Rank Order test as recommended by
ing between options such as ‘true’ or ‘probably true’ and ‘false’ Pallant (2016). Continuous variables included DOT test scores
or ‘probably false’. and previous academic achievement as measured by tertiary
The test took in excess of 40 minutes to complete. Therefore, entrance scores (ATAR score). When comparing DOT test scores
questions which were identified as unclear, which did not illicit between education groups, which were treated as categorical
the intended responses, or caused misconceptions of the variables, the non-parametric equivalent of t-test, a Mann–
scientific content were removed. The resulting DOT V1 con- Whitney U test was used. When comparing a continuous
tained seven questions relating to ‘Making Assumptions’, seven variable of the same participants taken at different times using
questions relating to ‘Analysing Arguments’, six questions Wilcoxon signed rank test, the non-parametric equivalent of a
relating to ‘Developing Hypotheses’, five questions relating to paired t-test, was used.
‘Testing Hypotheses’ and five questions relating to ‘Drawing
Published on 12 July 2019. Downloaded on 7/17/2019 7:22:08 AM.

Conclusions’. The terms used to select a multiple choice option


were written in a manner more accessible to science students, DOT V1: internal reliability and content
for example using terms such as ‘Valid Assumption’ or ‘Invalid validity
Assumption’ instead of ‘Assumption Made’ or ‘Assumption Not
Made’. Finally, the number of options in the ‘Developing Initial internal reliability testing of the DOT V1 was carried out
Hypotheses’ section were reduced from five to three of ‘likely with first year chemistry students in semester 1 of 2016. They
to be an accurate inference’, ‘insufficient information to deter- were enrolled in either Chemistry I, a general chemistry course,
mine accuracy’ and ‘unlikely to be an accurate inference’. or Advanced Chemistry I, for students who had previously
studied chemistry. Content validity was evaluated through
a focus group of a science education research community of
Data treatment of responses to the practice at Monash University. This sample was self-selected as
participation in the community of practice was an opt-in
DOT test and WGCTA-S activity for academics with an inherit interest in education.
Several iterations of DOT test (and the WGCTA-S in Stage 4) Furthermore data collection was convenient and opportunistic
were administered to a variety of participants throughout this as the availability of some members was limited with some
study. The responses to questions on these tests and test scores unable to attend both sessions.
were used in the statistical evaluations of the DOT test. The
following section outlines the data treatment and statistical Internal reliability method
approaches applied to test responses throughout this study. The DOT V1 was administered to the entire first year cohort of
All responses to the DOT test and WGCTA-S were imported approximately 1200 students at the conclusion of a compulsory
into IBM SPSS statistics (V22). Frequency tables were generated laboratory safety induction session during the first week of
to determine erroneous or missing data. Data was considered semester. Students completed the test on an optically read
erroneous when participants had selected ‘C’ to questions multiple choice answer sheet. 744 answer sheets were sub-
which only contained options ‘A’ or ‘B’, or when undergraduate mitted and that data imported into Microsoft Excel and treated
students who identified their education/occupation as that of according to the procedure outlined the data treatment section
an academic. The erroneous data was deleted and treated as above. 615 cases were used for statistical analysis after the data
missing data points. In each study a variable was created to was filtered. As approximately half the cohort genuinely
determine the sum of unanswered questions (missing data) for attempted the DOT V1, the data produced was likely to be
each participant. Pallant (2016, pp. 58–59) suggests a judge- representative of the overall cohort (Krejcie and Morgan, 1970).
ment call is required when considering missing data and However, the data may be reflective of self-selecting partici-
whether to treat certain cases as genuine attempts to complete pants who may be high achieving students or inherently inter-
the test or not. In the context of this study a genuine attempt ested in developing their critical thinking skills. At the time,
was based on the number of questions a participant left demographic data such as sex, previous chemical/science
unanswered. Participants who attempted at least 27 questions knowledge, previous academic achievement and preferred
were considered to have genuinely attempted the test. language(s) were not collected and it is possible one or more
Responses to all DOT Test (and WGTCA-S) questions were coded of these discriminates may have impacted performance on
as correct or incorrect responses. Upon performing descriptive the test. Demographic data was subsequently collected and
statistics, the DOT V1 scores were found to exhibit a normal analysed for later versions of the DOT. Descriptive statistics found
(Gaussian) distribution whereas the DOT V3 exhibited a non- the DOT V1 scores exhibited a normal (Gaussian) distribution.
parametric (not-normal) distribution. In light of these distributions Internal consistency was then determined by calculating
it was decided to treat all data obtained as non-parametric. Cronbach’s a (Cronbach, 1951).

This journal is © The Royal Society of Chemistry 2019 Chem. Educ. Res. Pract.
View Article Online

Paper Chemistry Education Research and Practice

Content validity method learning management system pages. The invitation was open to
The focus groups were conducted over two separate one hour any current Monash student currently studying a chemistry
meetings consisting of fifteen and nine participants respectively. unit or had previously completed a chemistry unit. 20 students
Only five participants were common to both sessions. Participants attended the first day of the study and 18 of these students
were provided with the DOT V1 and asked to complete the attended the second day. The initial invitation would have
questions from a given section. After completing a section on reached almost 2000 students. Therefore, the findings from
the DOT V1, participants were asked to discuss their responses, the 18 students who participated in both days of the study were
their reasoning for their responses and comment how they might of limited generalisability.
improve the questions to better elicit the intended response. The On the first day, demographic data was collected: sex, dominant
focus groups were recorded, transcribed and analysed in line with language, previous academic achievement using tertiary entrance
the procedures and theoretical frameworks described previously. scores (ATAR), level of chemistry being studied and highest level
of chemistry study completed at Monash University. Students
completed the DOT V2 using an optical reader multiple choice
DOT V2: test–retest reliability, answer sheet. This was followed by completion of the WGCTA-S
convergent validity and content validity in line with procedures outlined by the Watson-Glaser critical
Published on 12 July 2019. Downloaded on 7/17/2019 7:22:08 AM.

thinking appraisal short form manual (2006). The WGCTA-S was


Several changes were made based upon analysis of the data chosen for analysis of convergent validity, as it was similar in
obtained from students and academics to produce the DOT V2. length to the DOT V2 and was intended to measure the same
Many parent statements were rewritten to include additional aspects of critical thinking. The fact that participants completed
information in order to reduce the need to draw on knowledge the DOT V2 and then the WGCTA-S may have affected the
external to the questions. For example, in the questions that participants’ performance on the WGCTA-S. This limitation will
used formal charges on anions and cations, statements were be addressed in the results.
included to describe the superscripts denoting charges: ‘Carbonate After a brief break, the participants were divided into groups
(CO32 ) has a formal charge of negative 2.’ of five to eight students and interviewed about their overall
The most extensive rewriting of the parent statements impression of the WGCTA-S and their approach to various
occurred in the ‘Analysing Arguments’ section. The feedback questions. Interviewers prevented the participants from discussing
provided from the focus groups indicated that parent state- the DOT V2 so as not to influence each other’s responses upon
ments did not include sufficient information to adequately retesting.
respond to the questions. On the second day participants repeated the DOT V2. DOT
Additional qualifying statements were added to several V2 attempts were completed on consecutive days to minimise
questions in order to reduce ambiguity. In the parent statement participant attrition. Upon completion of the DOT V2 and after
of the exemplar question the first sentence was added to a short break, participants were divided into two groups of nine
eliminate the need to understand that differences exist between and interviewed about their impressions of the DOT V2, how
diamagnetic and paramagnetic metal complexes, with respect they approached various questions and comparisons between
to how they interact with magnetic fields: the DOT V2 and WGCTA-S.
Paramagnetic and diamagnetic metal complexes behave Responses to the tests and demographic data were imported
differently when exposed to a magnetic field. A chemist tested a into IBM SPSS (V22). Data was treated in accordance with the
metal complex by placing it in a magnetic field. From the result of procedure outlined earlier. With the exception of tertiary entrance
the test the chemist decided the metal complex had unpaired score, there was no missing or erroneous demographic data.
electrons and was therefore paramagnetic. Spearman’s rank order correlations were performed comparing
Finally, great effort was made in the organisation of the ATAR scores to scores on the WGCTA-S and the DOT V2.
DOT V2 to guide the test taker through a critical thinking Test–retest reliability was determined using a Wilcoxon signed
process. Similar to Halpern’s approach to analysing an argument rank test (Pallant, 2016, pp. 234–236, pp. 249–253), When the
(Halpern, 1996a), Halpern teaches that an argument is comprised scores of the tests taken at different times have no significant
of several conclusions, and that the credibility of these conclusions difference, as determined by a p value greater than 0.05, the test
must be evaluated. Furthermore, the validity of any assumptions, can be considered to have acceptable test–retest reliability
inferences and deductions used to construct the conclusions (Pallant, 2016, p. 235). Acceptable test–retest reliability does
within an argument need to be analysed. To this end the test not imply that test attempts are equivalent. Rather, good test–
taker was provided with scaffolding from making assumptions to retest reliability suggests that the precision of the test to
analysing arguments in line with Halpern’s approach. measure the construct of interest is acceptable. Median scores
of the participants’ first attempt of the DOT V2 were compared
Test–retest reliability and convergent validity method with the median score of the participants’ second attempt of
A cross-sectional study of undergraduate students was used the DOT V2. To determine the convergent validity of the DOT
to investigate test–retest reliability, content and convergent V2 the relationship between scores on the DOT V2 and perfor-
validity of DOT V2. Participants for the study were recruited mance on the WGCTA-S was investigated using Spearman’s
by means of advertisements in Monash chemistry facilities and Rank order correlation.

Chem. Educ. Res. Pract. This journal is © The Royal Society of Chemistry 2019
View Article Online

Chemistry Education Research and Practice Paper

Content validity method assumptions were not intended and hence the questions were
In each of the interviews, participants were provided with blank reworded. For example, question 14 asked whether a ‘low yield’
copies of the relevant test (the WGCTA-S on day 1 and the DOT would occur in a given synthetic route. The term ‘low yield’ was
V2 on day 2). Participants were encouraged to make any general changed to ‘an insignificant amount’ to remove any assump-
remarks or comments with respect to the tests they had taken, tions regarding the term ‘yield’.
with the exception of the interviews on day 1, where inter- The study of the DOT V3 required participants to be drawn from
viewers prevented any discussion of the questions with the DOT several distinct groups in order to assess criterion and discriminate
V2. Beyond the guidelines described in this section, the inter- validity. For the purpose of criterion validity, the DOT V3 was
viewers did not influence the participants’ discussion, correct administered to first year and third year undergraduate chemistry
their reasoning or provide the correct answers to either test. students, honours and PhD students and post-doctoral researchers
After approximately 15 minutes of participants freely dis- at Monash University, and chemistry education academics from
cussing the relevant test, the interviewers asked the partici- an online community of practice. Furthermore, third year under-
pants to look at a given section on a test, for example the graduate chemistry students from another Australian higher
‘Testing Hypotheses’ section of the DOT V2, and identify any education institution (Curtin University) also completed the DOT
questions they found problematic. In the absence of students V3 to determine discriminate validity with respect to performance
Published on 12 July 2019. Downloaded on 7/17/2019 7:22:08 AM.

identifying any problematic questions, the interviewers used a of the DOT V3 outside of Monash University.
list of questions from each test to prompt discussion. The Participants
participants were then asked as a group:
 ‘What do you think the question is asking you?’ The DOT V3 was administered to undergraduate students in
 ‘What do you think is the important information in this question?’ paper format. The test was presented in face-to-face activities
 ‘Why did you give the answer(s) you did to this question?’ such as lectures and workshops, or during laboratory safety
The interview recordings were transcribed and analysed in inductions. Students responded directly onto the test. First year
line with the procedures and theoretical frameworks described participants were drawn from the general chemistry unit run
previously to result in four distinct themes which were used to in semester 1 2017. The DOT V3 was administered to 576
code the transcripts. these students. 199 students attempted the test, representing
approximately 19% of the first year cohort.
Third year participants were drawn from an advanced
inorganic chemistry course at Monash University and a
DOT V3: internal reliability, criterion capstone chemical engineering course at Curtin University.
validity content validity and 54 students (37%) responded to the DOT test at Monash
discriminate validity University. The 23 students who completed the DOT V3 at
Curtin University represented the entire cohort.
Detailed instructions and a cover page were added to the DOT Post-doctoral researchers, honours and PhD students from
V3. Participants from the study of the DOT V2 drew heavily on Monash University were invited to attempt the DOT V3. 40
any worked examples in the introduction of each section. participants drawn from these cohorts attended a session
Carefully written examples were provided in the introduction where they completed the DOT V3 in a paper format, marking
of each section of the DOT V3. responses directly onto the test. All cohorts who completed the
Many scientific terms were either simplified or removed in test in paper format required approximately 20 to 30 minutes
the DOT V3. In the case of the exemplar question, the focus was to complete the test.
moved to an alloy of thallium and lead rather than a ‘metal An online discussion group of approximately 300 chemistry
complex’. Generalising this question to focus on an alloy academics with an interest in education, predominately from
allowed these questions to retain scientific accuracy and reduce Australia, the UK and Europe, were invited to complete an
the tendency for participants to draw on knowledge outside the online version of the DOT V3. Online completion was untimed
information presented in the questions: and 46 participants completed the DOT V3.
Metals which are paramagnetic or diamagnetic behave differ-
ently when exposed to an induced magnetic field. A chemist tested a Treatment of data
metallic alloy sample containing thallium and lead by placing it in All responses to the DOT V3 were imported into IBM SPSS (V22)
an induced magnetic field. From the test results the chemist as 385 cases. Data was treated in accordance with the procedure
decided the metallic alloy sample repelled the induced magnetic outlined above. 97 cases contained missing data. A further
field and therefore was diamagnetic. 18 cases were excluded from analysis as these participants
This statement was then followed by the prompt asking the identified as second year students. A total of 288 cases were
participant to decide if the assumption presented was valid or considered to be genuine attempts and used for statistical
invalid: analysis. As will be discussed later there was no statistical
Paramagnetic metals do not repel induced magnetic fields. difference between the performance of third year students from
Several terms were rewritten as their use in science implied Monash and Curtin Universities, and therefore the third year
assumptions as identified by the student focus groups. These students were treated as one group. The Honours, PhD and

This journal is © The Royal Society of Chemistry 2019 Chem. Educ. Res. Pract.
View Article Online

Paper Chemistry Education Research and Practice

Post-Doctoral variables were combined into the education DOT V1: internal reliability and content
group ‘Postgraduates’ as the data sets were so small. validity
Descriptive statistics of the 270 DOT V3 results revealed a
larger proportion scores were above the mean, thus the data The internal consistency of the DOT V1 as determined via
was considered non-parametric for the purposes of reliability Cronbach’s a suggested the DOT V1 had limited internal
and validity statistical analysis. Internal consistency was then reliability (a = 0.63). Therefore, the sub-scales within the DOT
determined by calculating Cronbach’s a (Cronbach, 1951). The V1 could not confidently be added together to measure critical
five sub-scales of the DOT V3 (Making Assumptions, Develop- thinking skill.
ing Hypotheses, Testing Hypotheses, Drawing Conclusion and The academic participants in the focus groups navigated the
Analysing Arguments) underwent a principle component analysis questions on the DOT V1 to arrive at the intended responses.
to determine the number of factors affecting the DOT V3. However, there was rarely consensus within the group and a
minority, usually one or two participants, disagreed with the
Criterion validity method group. The difficulties the academics had in responding to the
Several Mann–Whitney U tests were conducted to determine the DOT V1 were made clear from four themes which emerged from
criterion validity of the DOT V3. The hypothesis was that the analysis: ‘Instruction Clarity’, ‘Wording of the Question(s)’,
Published on 12 July 2019. Downloaded on 7/17/2019 7:22:08 AM.

academics were better critical thinkers than postgraduates, ‘Information within the Statement’ and ‘Prior Knowledge’
that postgraduates were better critical thinkers than third year (Table 2). The last theme was generally found to be associated
undergraduates, and third year undergraduates were better with the other themes.
critical thinkers than first year undergraduates. Based on this The theme of ‘Instruction Clarity’ was used to describe when
assumption, the hypothesis was that there would be a statically participants either had difficulty interpreting the instructions
significant improvement in median DOT V3 scores relative to or intentionally ignored the instructions. Several participants
experience within the tertiary education system. self-reported their tendency to only scan the instructions
without properly reading them, or did not read the statement
Discriminate validity method preceding a question in its entirety. When this behaviour
occurred, academics were quick to draw on outside knowledge.
Discriminate validity of the DOT V3 was based on whether the
This theme identified the need for clarity of the instructions
achievement on the DOT V3 was independent of previous
and providing relevant examples of what was meant by terms
academic achievement, and which university the participant
such as ‘is of significant importance’ or ‘is not of significant
attended. Finally, the effect of the higher education institution
importance’.
which the participant attended was considered using a Mann–
The theme of ‘Wording of the Questions’ referred to
Whitney U test comparing the difference in the median score
evaluating the meaning and use of particular words within
obtained on the DOT V3 of 3rd year Monash University chem-
the questions or the parent statements. The wording of several
istry students and the median score obtained on the DOT V3 of
questions led to confusion, causing the participants to draw on
3rd year Curtin University chemistry students.
outside knowledge. Unclear terminology hindered non-science
participants (education developers) from attempting the ques-
Results and discussion tions, and was further compounded by the use of terms such as
‘can only ever be’. For example, the use of the term ‘rather than’
As the DOT test was considered to be a psychometric test, validity confused participants when they knew a question had more
and reliability testing were essential to ensure students’ critical than two alternatives.
thinking skills are accurately and precisely measured (Nunnally and The theme ‘Information within the Statement’ referred to the
Bernstein, 1994; Kline, 2005; DeVellis, 2012). Three separate participants’ perceptions of the quality and depth of information
reliability and validity studies were conducted throughout the provided in the parent statements. Participants suggested some
course of test development, resulting in three iterations of the test questions appeared to be non-sequiturs with respect the
DOT test, the results of which are discussed here. corresponding parent statements. Participants felt they did not

Table 2 Themes identified in the qualitative analysis of the academic focus groups for DOT V1

Theme (theme representation %) Description Example


Instruction clarity (15%) Insufficient information within the instructions ‘‘. . .what do you mean by ‘is of significant importance’?’’
(sometimes due to superficial reading)
Wording of the question (41%) Attributing meaning to specific words within ‘‘. . .the language gives it away cause you say
a question ‘rather than’.’’
Information within the Adequate or insufficient information in the ‘‘Basically what you’re providing in the preamble
parent statement (15%) parent statement to response to the questions is the definition, right?’’
Prior knowledge (30%) The using or requiring use of prior scientific ‘‘I didn’t think there was any information about the
knowledge from outside the test negative charge. . . I didn’t know what that meant,
so I tried to go on the text.’’

Chem. Educ. Res. Pract. This journal is © The Royal Society of Chemistry 2019
View Article Online

Chemistry Education Research and Practice Paper

have enough information to make a decision, and the lack of months, two weeks or four days. Each of these studies reported
clarity in the instructions further compounded the problem for test–retest correlations ranging from p = 0.73 to 0.89, and larger
participants. p values were associated with shorter time frames between test–
The theme ‘Prior Knowledge’ identified instances when retesting. However, as the p value of the Wilcoxon’s signed rank
participants had drawn on information not provided in the test was sufficiently large (0.91), it was unlikely that the DOT V2
DOT V1 to answer the questions. Several issues regarding prior would have exhibited poor test–retest reliability were it to be
knowledge emerged from the discourse. Participants identified administered over a longer time interval.
that there were some assumptions made about the use of the
chemical notations. Finally some participants highlighted that Convergent validity
having prior knowledge, specifically in science and/or chemis- Analysis of convergent validity was conducted using the
try, was to their detriment when attempting the questions. WGCTA-S and the attempts of the DOT V2 from the first day,
as there was no statistical significance between the scores of the
two attempts of the DOT V2. The relationship between perfor-
DOT V2: test–retest reliability, mance on the DOT V2 and performance on the WGCTA-S was
convergent and content validity investigated using Spearman’s rank-order correlation to reveal
Published on 12 July 2019. Downloaded on 7/17/2019 7:22:08 AM.

a small positive correlation between the two variables (r = 0.31,


The study of the DOT V2 was interested in test–retest reliability, n = 18, p = 0.21). The WGCTA users guide (Watson and Glaser,
convergent, and content validity of the DOT V2 compared with 2006) suggests the correlation with other reasoning tests
the WGTCA-S. The participants for this cross-sectional study should reflect the degree of similarity between the tests. In
were comprised of equal numbers of male and female students, fact, Pearson reports a range of correlations from 0.48 to 0.70
17 participants identified English as their preferred language when comparing the WGCTA with other reasoning tests
(85%) and three participants English as their second language (Watson and Glaser, 2006, pp. 41–42). The small correlation
(15%). Their ages ranged from 18 to 21 with a median age of 19. of the DOT V2 relative to the WGCTA-S did suggest that the
Six students were undertaking first year chemistry courses (30%), DOT V2 was not necessarily measuring the same aspects of
five were taking second year courses (25%), seven were taking third critical thinking as the WGCTA-S as was initially presumed. The
year courses (35%), one taking fourth year (honours research) (5%), modest positive correlation may have been due to the small
and one currently not studying any chemistry (5%). number (n = 20) of self-selected participants and did suggest
A total of 15 participants provided their tertiary entrance the DOT V2 exhibited some degree of convergent validity.
score (ATAR), as a measure of previous academic achievement. A larger number of participants may have provided more convin-
There is some discussion in the literature which suggests cing data. For example, in a study of the physics critical thinking
university entrance scores obtained in high school do not test based on the HCTA, as few as 45 participants was sufficient to
reflect intelligence and cognitive ability (Richardson, Abraham obtain statistically significant data (Tiruneh et al., 2016).
and Bond, 2012). However, a comparison of previous academic
achievement, reported via ATAR scores, revealed a small posi- Content validity
tive correlation with scores obtained on the DOT V2 (r = 0.23) The group interviews provided evidence participants recognised
and a moderately positive correlation with scores obtained on the importance of the context of the tests and that they felt more
the WGCTA-S (r = 0.47). comfortable doing the DOT V2 as they attached greater signifi-
cance to the chemistry context.
Test–retest reliability ‘‘I found the questions (on the DOT V2) a bit more interesting
18 participants took part in test–retesting of the DOT V2. and engaging in general where as this one (WGCTA-S) seemed a bit
A Wilcoxon signed rank test revealed no statistically significant more clinical.’’
change in score of the DOT V2 due to test–retesting, with a very However, two participants did express their preference for
small effect size (z = 0.11, p = 0.91, r = 0.03). The median score the WGCTA-S citing the detailed examples in the instructions of
on the first day of testing (22.0) was similar to the median score each section, and their frustration when attempting the DOT
on the second day of testing (22.5), suggesting good test–retest V2, requiring them to recognise whether they were drawing on
reliability. The main concern regarding these findings was that chemistry knowledge outside of the question.
the two attempts of the DOT V2 were made on consecutive days. The qualitative analysis of the student focus group data
This was done in an attempt to reduce participant attrition but provided useful insight regarding the content validity of the
risked participants responding exactly as they did in their first DOT V2. When discussing their responses, the participants
attempt of the DOT V2 from memory. In fact, students identi- often arrived at a group consensus on the correct answers for
fied that they felt they were remembering the answers from both the DOT V2 and the WGCTA-S. Rarely did the participants
their previous attempt. initially arrive at a unanimous decision. In several instances on
‘‘The second time it felt like I was just remembering what I put both tests, there were as many participants in favour of the
down the day before.’’ incorrect response as there were participants in favour of the
The WGCTA-S manual (Watson and Glaser, 2006, pp. 30–31) correct response. Four themes emerged from the analysis of
listed three studies where test–retesting intervals of three the transcripts which are presented in Table 3.

This journal is © The Royal Society of Chemistry 2019 Chem. Educ. Res. Pract.
View Article Online

Paper Chemistry Education Research and Practice

Table 3 Themes identified in the qualitative analysis of the student focus groups for the DOT V2 and the WGCTA-S

Theme (theme representation %) Description Example


Strategies used to attempt test Approaches participants took including dependence ‘‘. . .you could come back to it and then look
questions (46%) on examples, evaluating key words, construction of at how each of the example questions
rules or hypothetical scenarios were answered. . .’’
Difficulties associate with Participants consciously aware of their prior knowledge, ‘‘It’s quite difficult to leave previous knowledge
prior knowledge (21%) either attempting to restrict its use or their prior knowledge and experience off when you’re trying to
is in conflict with their response to a given question approach these (questions).’’
Terms used to articulate Evidence of critical thinking and critical thinking ‘‘I think like the first section. . .was more difficult
cognitive processes (22%) terminology the participants were exposed to than the other because I think I had more bias
throughout the focus groups, in particular ‘bias’ in that question.’’
Evidence of peer Discourse between participants in which new insight ‘‘To me, the fact that you know it starts
learning (11%) was gained regarding how to approach test questions talking about. . .fall outside of the information
and is therefore an invalid assumption.’’

The theme ‘Strategies used to attempt test questions’ describes meaning’ of a variety of stimuli (Facione, 1990, p. 8). Others
Published on 12 July 2019. Downloaded on 7/17/2019 7:22:08 AM.

both the participants’ overall practice and increasing familiarity consider this behaviour to be more reflective of problem
with the style of questions, and also the specific cognitive solving skills, describing the behaviour as ‘understanding of
techniques used in attempting to answer questions. The approach the information given’ in order to build a mental representa-
participants used when performing these tests was reflective of tion of the problem (OECD, 2014, p. 31). These patterns of
the fact they became more familiar with the style of questions and problem solving were most evident in the discussions in
their dependence on the examples provided diminished. response to the WGCTA-S questions.
Some participants had difficulty understanding what was With respect to the DOT V2, participants had difficulty
meant by ‘Assumption Made’ and ‘Assumption Not Made’ in the understanding the intended meaning of the questions without
‘Recognition of Assumption’ section in the WGCTA-S and drew drawing on the previous knowledge of chemistry and science.
heavily on the worked examples provided in the introduction to For example, there were unexpected discussions of the implica-
the section. At the conclusion of this study, these participants tion of the term ‘lower yield’ in a chemistry context and the
had completed three critical thinking tests and were becoming relationship to a reaction failing. Participants pointed out
familiar with how the questions were asked and what was underlying assumptions associated with terms such as ‘yield’
considered a correct response. However, test–retesting with the highlighting that the term ‘yield’ was not necessarily reflective
DOT V2 indicated that there was no change in performance. of obtaining a desired product.
There was concern that providing detailed instructions on The theme ‘Difficulties associated with prior knowledge’
the DOT test may in fact develop the participants’ critical described when participants drew on knowledge from outside
thinking skills in the process of attempting to measure it. For the test in efforts to respond to the questions. In both the
example, a study conducted (Heijltjes et al., 2015) with 152 WGCTA-S and the DOT V2, the instructions clearly stated to
undergraduate economics students who were divided into six only use the information provided within the parent statements
approximately equal groups found that participants who were and the questions. These difficulties were most prevalent when
exposed to the written instructions performed on average 50% participants described their experiences with the DOT V2. For
better on the critical thinking skills test compared to those who example, the participants were asked to determine the validity
did not receive written instructions. It does seem plausible that of a statement regarding the relationship between the formal
a similar result would occur with the DOT test, and evaluating charge of anions and how readily anions accept hydrogen ions.
the impact of instructions and examples using control and test In arriving at their answer, one student drew on their outside
groups would be beneficial in future studies of the DOT test. knowledge of large molecules such as proteins to suggest:
The second aspect of this theme was the application of ‘‘What if you had some ridiculous molecule that has like a
problem solving skills and the generation of hypothetical 3 minus charge but the negative zones are all the way inside the
scenarios whereby deductive logic could be applied. The follow- molecule then it would actually accept the H plus?’’
ing quote described an example of a participant explicitly While this student’s hypothesis led them to decide that the
categorising the information they were provided with in the assumption was invalid, which was the intended response, the
parent statements and systematically analysing those relation- intended approach of this question was to recognise that
ships to answer the questions. the parent statement made no reference to how strongly cations
‘‘I find that with (section) three, deduction, that it was really easy and anions are attracted to each other.
to think in terms of sets, it was easier to think in terms set rather It was concerning that some participants felt they had to
than words, doodling Venn diagrams trying to solve these ones.’’ ‘un-train’ themselves of their chemical knowledge in order to
The Delphi report considers behaviours described by properly engage with the DOT V2. Some participants high-
this theme to be part of the interpretation critical thinking lighted that they found the WGCTA-S easier as they did not
skill which describes the ability ‘to detect . . . relationships’ or have to reflect on whether they were using their prior knowl-
‘to paraphrase or make explicit . . . conventional or intended edge. However, many participants were asking themselves

Chem. Educ. Res. Pract. This journal is © The Royal Society of Chemistry 2019
View Article Online

Chemistry Education Research and Practice Paper

‘why am I thinking what I’m thinking?’ which is indicative of academics provided an ATAR score as many of these participants
high order metacognitive skills described by several critical may not have completed their secondary education in Australia
thinking theoreticians (Facione, 1990, p. 10; Kuhn, 2000; Tsai, or before the ATAR was introduced in 2009. 271 (91.6%) parti-
2001). Students appear to be questioning their responses to the cipants identified English as their preferred language.
DOT V2 and whether their responses are based on their own
pre-existing information or the information presented within Internal reliability
the test as highlighted in the following statement. The internal consistency of the DOT V3 using Cronbach’s a was
‘‘You had to think more oh am I using my own knowledge or found to be acceptable (a = 0.71) which suggested the DOT V3
what’s just in the question? I was like so what is assumed to be exhibited acceptable internal reliability (DeVellis, 2012, p. 109)
background knowledge. What’s background knowledge?’’ and the sub-scales could confidently be added together to
The theme ‘Terms used to articulate cognitive processes’ measure critical thinking skill. In order to generate further
described the participants applying the language from the evidence with respect to the internal reliability, the sub-scales
instructions of the WGCTA-S and the DOT V2 to articulate their of the DOT V3 were determined to be suitable for factor
thought processes. In particular, participants were very aware analysis. The analysis revealed all correlations to be greater
of their prior knowledge, referring to this as ‘bias’. than 0.3 with the exception of the correlation between the
Published on 12 July 2019. Downloaded on 7/17/2019 7:22:08 AM.

In response to the questions in the ‘Developing Hypothesis’ ‘Developing Hypotheses’ and ‘Drawing Conclusions’ section
section, which related to the probability of failure of an ester- (0.26). Principle factor analysis revealed the DOT V3 was
ification reaction, one student identified that they attempted to unidimensional, with one factor explaining 50.31% of total variance
view the questions from the perspective of individuals with and all sub-scales correlated with one factor (0.79–0.59), in this case,
limited scientific knowledge in order to minimise their prior likely the overall score on the DOT V3. When factor analysis was
chemical knowledge to influence their responses. There was compared on the sub-scales of the WGCTA it was also found to be
much discussion of what was meant by the term ‘failure’ in unidimensional (Hassan and Madhum, 2007).
the context of a chemical reaction and whether failure referred to
the unsuccessful collisions at a molecular level or the absence of Criterion validity
a product at the macroscopic level. Table 5 shows that significant differences in median scores
The students engaged in dialogues which helped refine the were found between all education groups with the exception of
language they used in articulating their thoughts or helped those of postgraduates compared to those of the academics.
them recognise thinking errors. This describes the final emer- Of particular interest was that medium (r 4 0.3) and large
gent theme of ‘Evidence of peer learning’. For example, when (r 4 0.5) effect sizes were obtained when comparing the
discussing their thought processes regarding a question in the median scores of first year students with third year students,
‘Deduction’ section of the WGCTA-S one student shared their postgraduates and academics. These findings provided strong
strategy of having constructed mental Venn diagrams and had evidence that the DOT V3 possessed good criterion validity
correctly identified how elements of the question related. This when measuring the critical thinking skills of chemistry
prompted others student to recognise the connection they had students up to and including post graduate level.
initially failed to make and reconsider their response. Interestingly, there appeared to be no statistically significant
difference in DOT V3 scores when comparing postgraduates
and academics. If the assumption that critical thinking skill
DOT V3: internal reliability, criterion is correlated positively to time spent in tertiary education
and discriminate validity environments is valid, it is likely that the DOT V3 was not
sensitive enough to detect any difference in critical thinking
Table 4 summarises the demographic data according to educa- skill between postgraduates and academics.
tion group. The distribution of sex representative of first year,
third year undergraduates and postgraduates. The distribution Discriminate validity
of sex and age for the academics education group contained Several Mann–Whitney U tests were conducted to determine if
slightly more males (58%) and the median age (50) would sex was a significant predictor of performance on the DOT V3.
suggest the majority of academics were in mid to late career. These tests were conducted on the cohort as a whole and at an
The mean ATAR score was reflective of the high admissions education group level (Table 6). The overall comparison of
standards set by the two universities. Fewer postgraduates and median scores would suggest that sex is a discriminate which
affects performance on the DOT V3, favouring males. When
Table 4 Demographic data of participants for the DOT V3 viewed at the educational group level a statistically significant
difference in performance is only observed at the first year level.
Education group Mean ATAR Female Male Typically researchers do not find any differences between sex and
First year 87.10 (n = 104) 54% (n = 64) 43% (n = 51) test scores of other critical thinking tests (Hassan and Madhum,
Third year 90.03 (n = 55) 39% (n = 26) 61% (n = 41) 2007; Butler, 2012). However, differences in performance on
Postgraduates 87.35 (n = 19) 43% (n = 19) 57% (n = 25)
Academics 89.58 (n = 3) 38% (n = 15) 58% (n = 23) aptitude tests and argumentative writing tests based on the
Overall 88.23 (n = 181) 46% (n = 124) 52% (n = 140) sex of the participants is not unheard of in the literature

This journal is © The Royal Society of Chemistry 2019 Chem. Educ. Res. Pract.
View Article Online

Paper Chemistry Education Research and Practice

Table 5 Mann–Whitney U tests comparing the median scores obtained on the DOT V3 of each education group

Education group
1st year 3rd year P’grad Academic
1st year (n = 119, Md = 16) p o 0.001, r = 0.39 p o 0.001, r = 0.64 p o 0.001, r = 0.59
3rd year (n = 67, Md = 21) p o 0.001, r = 0.39 p o 0.001, r = 0.30 p = 0.003, r = 0.28
Postgraduates (n = 44, Md = 23.5) p o 0.001, r = 0.53 p o 0.001, r = 0.30 p = 0.691, r = 0.04
Academic (n = 40, Md = 24) p o 0.001, r = 0.59 p = 0.003, r = 0.28 p = 0.691, r = 0.04

Table 6 Mann–Whitney U tests comparing the median score of the DOT students at any year level irrespective of the students’ prior
V3 as determined by sex chemistry knowledge was developed. Test development
Sex included a pilot study (DOT P) and three iterations of the test
Education (DOT V1, DOT V2 and DOT V3). Throughout test development
group Female Male Significance
the various iterations of the test reliability and validity studies
1st year n = 64, Md = 15 n = 51, Md = 18 p = 0.007, r = 0.25 were conducted. The findings from these studies demonstrate
3rd year n = 26, Md = 20.5 n = 41, Md = 21 p = 0.228, r = 0.15
Published on 12 July 2019. Downloaded on 7/17/2019 7:22:08 AM.

Postgraduate n = 19, Md = 24 n = 25, Md = 23 p = 0.896, r = 0.02 that the most recent version of the test, DOT V3, had good
Academic n = 15, Md = 24 n = 23, Md = 22 p = 0.904, r = 0.02 internal reliability. Determination of discriminate validity
showed that the DOT V3 score was independent of previous
academic achievement (as measured by ATAR score). Addition-
(Halpern et al., 2007; Preiss et al., 2013). Beyond the first year, ally, performance on the DOT V3 was found to be independent
sex appears not to be a discriminator of score on the DOT V3. of whether the participant attended Monash or Curtin Univer-
Further evaluation of the test will be conducted on a larger sity. The test was found to exhibit strong criterion validity by
sample size of first year students to see if the difference comparing the median scores of first year undergraduate, third
between sexes persists. year undergraduate and postgraduate chemistry students, and
Using Spearman’s Rank-order correlation coefficient, there academics from an online community of practice. However, the
was a weak, positive correlation between DOT V3 score and DOT V3 was unable to distinguish between postgraduate
ATAR score (r = 0.20, n = 194, p = 0.01), suggesting previous and academic participants. Additionally, qualitative studies
achievement was only a minor dependent with respect to score conducted throughout test development highlights the
on the DOT V3. This correlation was in line with previous inherit difficulty of assessing critical thinking independent of
observations collected during testing of the DOT V2 where the context as participants expressed difficulty restricting the use of
correlation between previous academic achievement and per- their prior chemical knowledge when responding to the
formance on the test were found to have a small correlation test. This qualitative finding lends evidence to the specifist
(r = 0.23) but the sample size was small (n = 15). However, as perspective that critical thinking cannot truly be independent
the sample size used in the study of this relationship in the of context.
DOT V3 was much larger (n = 194) these findings suggested The DOT V3 offers a tool with which to measure a student’s
performance on the DOT V3 was only slightly correlated to critical thinking skills and the effect of any teaching interven-
previous academic achievement. tions specifically targeting the development of critical thinking.
In order to determine the validity of the DOT V3 outside of The test is suitable for studying the development of critical
Monash University the median scores of third year students thinking using a cross section of students, and may be useful in
from Monash University and Curtin University were compared longitudinal studies of a single cohort. In summary, research
using a Mann–Whitney U Test. The test revealed no significant conducted within this study provides a body of evidence
difference in the score obtained by Monash University students regarding reliability and validity of the DOT test, and it offers
(Md = 20, n = 44) and Curtin University students (Md = 22, the chemistry education community a valuable research and
n = 24), U = 670.500, z = 1.835, p = 0.07, r = 0.22. Therefore, the educational tool with respect to the development of under-
score obtained on the DOT V3 was considered independent of graduate chemistry students’ critical thinking skills. The DOT
where the participant attended university. It is possible that V3 is included in Appendix 1 (ESI†) (Administrator guidelines
an insufficient number of tests were completed, due to the and intended responses to the DOT V3 can be obtained upon
opportunistic sampling from both universities, and obtaining email correspondence with the first author).
equivalent sample sizes across several higher education institu-
tions would confirm whether the DOT V3 performs well across
higher education institutions. Implications for practice
Using the DOT V3, it may be possible to evaluate the develop-
Conclusion ment of critical thinking across a degree program, like much of
the literature which has utilised commercially available tests
A chemistry critical thinking test (DOT test), which aimed to (Carter et al., 2015). Using the DOT V3 it may be possible to
evaluate the critical thinking skills of undergraduate chemistry obtain baseline data regarding the critical thinking skills of

Chem. Educ. Res. Pract. This journal is © The Royal Society of Chemistry 2019
View Article Online

Chemistry Education Research and Practice Paper

students and use this data to inform teaching practices Barnett R., (1997), Higher education: a critical business, Buck-
aimed at developing critical thinking skills of students in ingham: Open University Press.
subsequent years. Behar-Horenstein L. S. and Niu L., (2011), Teaching critical
Whilst there is the potential to measure the development of thinking skills in higher education: a review of the literature,
critical thinking over a semester using the DOT V3, there is J. Coll. Teach. Learn., 8(2), 25–41.
evidence to suggest that a psychological construct, such as Bryman A., (2008), Social research methods, 3rd edn, Oxford:
critical thinking, does not develop enough for measureable Oxford University Press.
differences to occur in the space of only a semester Butler H. A., (2012), Halpern critical thinking assessment
(Pascarella, 1999). While the DOT V3 could be administered predicts real-world outcomes of critical thinking, Appl.
to the same cohort of students annually to form the basis of a Cognit. Psychol., 26(5), 721–729.
longitudinal study, there are many hurdles to overcome in such Carter A. G., Creedy D. K. and Sidebotham M., (2015), Evaluation
a study including participant retention and their developing of tools used to measure critical thinking development in
familiarity with the test. Much like the CCTST and the WGCTA nursing and midwifery undergraduate students: a systematic
pre- and post-testing (Jacobs, 1999; Carter et al., 2015), at least review, Nurse Educ. Today, 35(7), 864–874.
two versions of the DOT V3 may be required for pre- and post- Cronbach L. J., (1951), Coefficient alpha and the internal
Published on 12 July 2019. Downloaded on 7/17/2019 7:22:08 AM.

testing and for longitudinal studies. However, having a larger structure of tests, Psychometrika, 16, 297–334.
pool of questions does not prevent the participants from Danczak S. M., Thompson C. D. and Overton T. L., (2017), What
becoming familiar with the style of critical thinking questions. does the term critical thinking mean to you? A qualitative
Development of an additional test would require further relia- analysis of chemistry undergraduate, teaching staff and
bility and validity testing (Nunnally and Bernstein, 1994). employers’ views of critical thinking, Chem. Educ. Res. Pract.,
However, cross-sectional studies are useful in identifying 18(3), 420–434.
changes in critical thinking skills and the DOT V3 has demon- Davies M., (2013), Critical thinking and the disciplines recon-
strated it is sensitive enough to discern between the critical sidered, High. Educ. Res. Dev., 32(4), 529–544.
thinking skills of first year or third year undergraduate chem- Desai M. S., Berger B. D. and Higgs R., (2016), Critical thinking
istry students. skills for business school graduates as demanded by
employers: a strategic perspective and recommendations,
Acad. Educ. Leadership J., 20(1), 10–31.
Conflicts of interest DeVellis, R. F., (2012), Scale development: Theory and applica-
tions, 3rd edn, Thousand Oaks, CA: Sage.
There are no conflicts to declare.
Dressel P. L. and Mayhew L. B., (1954), General education:
explorations in evaluation, Washington, DC: American Council
Acknowledgements on Eduction.
Edwards D., Perkins K., Pearce J. and Hong J., (2015), Work
The authors would like to acknowledge participants from intergrated learning in STEM in Australian universities,
Monash University, Curtin University and academics from the retrieved from http://www.chiefscientist.gov.au/wp-content/
community of practice who took the time to complete the uploads/ACER_WIL-in-STEM-in-Australian-Universities_June-
various versions of the DOT test and/or participate in the focus 2015.pdf, accessed on 05/12/2016.
groups. This research was made possible through the Austra- Ennis R. H., (1989), Critical thinking and subject specificity:
lian Post-graduate Award funding and with guidance of the clarification and needed research, Educ. Res., 18(3), 4–10.
Monash University Human Ethics Research Committee. Ennis R. H., (1990), The extent to which critical thinking
is subject-specific: further clarification, Educ. Res., 19(4),
References 13–16.
Ennis R. H., (1993), Critical thinking assessment, Theory Into
Abrami P. C., Bernard R. M., Borokhovski E., Wade A., Surkes Practice, 32(3), 179–186.
M. A., Tamim R. and Zhang D., (2008), Instructional inter- Ennis R. H. and Weir E., (1985), The ennis-weir critical thinking
ventions affecting critical thinking skills and dispositions: a essay test: test, manual, criteria, scoring sheet, retrieved
stage 1 meta-analysis, Rev. Educ. Res., 78(4), 1102–1134. from http://faculty.education.illinois.edu/rhennis/tewctet/Ennis-
Abrami P. C., Bernard R. M., Borokhovski E., Waddington D. I., Weir_Merged.pdf, accessed on 09/10/2017.
Wade C. A. and Persson T., (2015), Strategies for teaching Facione P. A., (1990), Critical thinking: a statement of expert
students to think critically: a meta-analysis, Rev. Educ. Res., consensus for purposes of educational assessment and instruc-
85(2), 275–314. tion. Executive summary. ‘‘The Delphi report’’, Millbrae,
AssessmentDay Ltd, (2015), Watson Glaser critical thinking CA: T. C. A. Press.
appraisal, retrieved from https://www.assessmentday.co.uk/ Ferguson R. L., (2007), Constructivism and social constructi-
watson-glaser-critical-thinking.htm, accessed on 03/07/2015. vism, in Bodner G. M. and Orgill M. (ed.), Theoretical frame-
Bailin S., (2002), Critical thinking and science education, Sci. works for research in chemistry and science education, Upper
Educ., 11, 361–375. Saddle River, NJ: Pearson Education (US).

This journal is © The Royal Society of Chemistry 2019 Chem. Educ. Res. Pract.
View Article Online

Paper Chemistry Education Research and Practice

Flynn A. B., (2011), Developing problem-solving skills through learning activities, and student journal entries, J. Food Sci.
retrosynthetic analysis and clickers in organic chemistry, Educ., 9(3), 68–75.
J. Chem. Educ., 88, 1496–1500. Jackson D., (2010), An international profile of industry-relevant
Garratt J., Overton T. and Threlfall T., (1999), A question of competencies and skill gaps in modern graduates, Int.
chemistry, Essex, England: Pearson Education Limited. J. Manage. Educ., 8(3), 29–58.
Garratt J., Overton T., Tomlinson J. and Clow D., (2000), Critical Jacob C., (2004), Critical thinking in the chemistry classroom
thinking exercises for chemists, Active Learn. High. Educ., and beyond, J. Chem. Educ., 81(8), 1216–1223.
1(2), 152–167. Jacobs S. S., (1999), The equivalence of forms a and b of the
Glaser R., (1984), Education and thinking: the role of knowl- California critical thinking skills test, Meas. Eval. Counsel.
edge, Am. Psychol., 39(2), 93–104. Dev., 31(4), 211–222.
Gupta T., Burke K. A., Mehta A. and Greenbowe T. J., (2015), Johnson R. H., Blair J. A. and Hoaglund J., (1996), The rise of
Impact of guided-inquiry-based instruction with a writing informal logic: essays on argumentation, critical thinking,
and reflection emphasis on chemistry students’ critical reasoning, and politics, Newport, VA: Vale Press.
thinking abilities, J. Chem. Educ., 92(1), 32–38. Klein G. C. and Carney J. M., (2014), Comprehensive approach
Halpern D. F., (1993), Assessing the effectiveness of critical to the development of communication and critical thinking:
Published on 12 July 2019. Downloaded on 7/17/2019 7:22:08 AM.

thinking instruction, J. General Educ., 50(4), 238–254. bookend courses for third- and fourth-year chemistry
Halpern D. F., (1996a), Analyzing arguments, in Halpern D. F. majors, J. Chem. Educ., 91, 1649–1654.
(ed.), Thought and knowledge: an introduction to critical Kline T., (2005), Psychological testing: A practical approach to
thinking, 3rd edn, Mahwah, NJ: L. Erlbaum Associates, design and evaluation, Thousand Oaks, CA: Sage Publications.
pp. 167–211. Kogut L. S., (1993), Critical thinking in general chemistry,
Halpern D. F., (1996b), Thought and knowledge: An introduction J. Chem. Educ., 73(3), 218–221.
to critical thinking, 3rd edn, Mahwah, NJ: L. Erlbaum Krejcie R. V. and Morgan D. W., (1970), Determining sample size
Associates. for research activities, Educ. Psychol. Meas., 30(3), 607–610.
Halpern D. F., (1998), Teaching critical thinking for transfer Kuhn D., (1999), A developmental model of critical thinking,
across domains. Dispositions, skills, structure training, and Educ. Res., 28(2), 16–26.
metacognitive monitoring, Am. Psychol., 53, 449–455. Kuhn D., (2000), Metacognitive development, Curr. Dir. Psychol.
Halpern D. F., (2016), Manual: Halpern critical thinking assessment, Sci., 9(5), 178–181.
retrieved from https://drive.google.com/file/d/0BzUoP_pmwy1g Lehman D. R. and Nisbett R. E., (1990), A longitudinal study of
dEpCR05PeW9qUzA/view, accessed on 09/10/2017. the effects of undergraduate training on reasoning, Dev.
Halpern D. F., Benbow C. P., Geary D. C., Gur R. C., Hyde J. S. Psychol., 26, 952–960.
and Gernsbacher M. A., (2007), The science of sex Lehman D. R., Lempert R. O. and Nisbett R. E., (1988), The effects
differences in science and mathematics, Psychol. Sci. Public of graduate training on reasoning: formal discipline and
Interest, 8(1), 1–51. thinking about everyday-life events, Am. Psychol., 43, 431–442.
Hassan K. E. and Madhum G., (2007), Validating the Watson Lindsay E., (2015), Graduate outlook 2014 employers’ perspectives
Glaser critical thinking appraisal, High. Educ., 54(3), 361–383. on graduate recruitment in Australia, Melbourne: Graduate
Heijltjes A., van Gog T., Leppink J. and Pass F., (2015), Unravel- Careers Australia, retrieved from http://www.graduateca
ing the effects of critical thinking instructions, practice, and reers.com.au/wp-content/uploads/2015/06/Graduate_Out
self-explanation on students’ reasoning performance, Instr. look_2014.pdf, accessed on 21/08/2015.
Sci., 43, 487–506. Lowden K., Hall S., Elliot D. and Lewin J., (2011), Employers’
Henderson D. E., (2010), A chemical instrumentation game for perceptions of the employability skills of new graduates:
teaching critical thinking and information literacy in instru- research commissioned by the edge foundation, retrieved from
mental analysis courses, J. Chem. Educ., 87, 412–415. http://www.educationandemployers.org/wp-content/uploads/
Huber C. R. and Kuncel N. R., (2016), Does college teach critical 2014/06/employability_skills_as_pdf_-_final_online_version.
thinking? A meta-analysis, Rev. Educ. Res., 86(2), 431–468. pdf, accessed on 06/12/2016.
Inhelder B. and Piaget J., (1958), The growth of logical thinking Martineau E. and Boisvert L., (2011), Using wikipedia to develop
from childhood to adolescence: an essay on the construction of students’ critical analysis skills in the undergraduate chemistry
formal operational structures, London: Routledge & Kegan curriculum, J. Chem. Educ., 88, 769–771.
Paul. McMillan J., (1987), Enhancing college students’ critical thinking:
Insight Assessment, (2013), California critical thinking skills test a review of studies, J. Assoc. Inst. Res., 26(1), 3–29.
(CCTST), Request information, retrieved from http://www. McPeak J. E., (1981), Critical thinking and education, Oxford:
insightassessment.com/Products/Products-Summary/Critical- Martin Roberston.
Thinking-Skills-Tests/California-Critical-Thinking-Skills-Test- McPeak J. E., (1990), Teaching critical thinking: dialogue and
CCTST, accessed on 07/09/2017. dialectic, New York: Routledge.
Iwaoka W. T., Li Y. and Rhee W. Y., (2010), Measuring gains in Monash University, (2015), Undergraduate – area of study.
critical thinking in food science and human nutrition Chemistry, retrieved from http://www.monash.edu.au/pubs/
courses: the Cornell critical thinking test, problem-based 2015handbooks/aos/chemistry/, accessed on 15/04/2015.

Chem. Educ. Res. Pract. This journal is © The Royal Society of Chemistry 2019
View Article Online

Chemistry Education Research and Practice Paper

Moore T. J., (2011), Critical thinking and disciplinary thinking: Sarkar M., Overton T., Thompson C. and Rayner G., (2016),
a continuing debate, High. Educ. Res. Dev., 30(3), 261–274. Graduate employability: views of recent science graduates
Moore T. J., (2013), Critical thinking: seven definitions in and employers, Int. J. Innov. Sci. Math. Educ., 24(3), 31–48.
search of a concept, Stud. High. Educ., 38(4), 506–522. Stephenson N. S. and Sadler-Mcknight N. P., (2016), Developing
Nisbett R. E., Fong G. T., Lehman D. R. and Cheng P. W., (1987), critical thinking skills using the science writing heuristic in
Teaching reasoning, Science, 238, 625–631. the chemistry laboratory, Chem. Educ. Res. Pract., 17(1), 72–79.
Nunnally J. C. and Bernstein I. H., (1994), Psychometric theory, The Critical Thinking Co, (2017), Cornell critical thinking tests,
New York: McGraw-Hill. retrieved from https://www.criticalthinking.com/cornell-
OECD, (2014), Pisa 2012 results: creative problem solving: critical-thinking-tests.html, accessed on 9/10/2017.
students’ skills in tackling real-life problems (volume v), Thorndike E. L. and Woodworth R. S., (1901a), The influence of
OECD Publishing, retrieved from http://dx.doi.org/10.1787/ improvement in one mental function upon the efficiency of
9789264208070-en, accessed on 05/01/2018. other functions, (i), Psychol. Rev., 8(3), 247–261.
Oliver-Hoyo M. T., (2003), Designing a written assignment to Thorndike E. L. and Woodworth R. S., (1901b), The influence of
promote the use of critical thinking skills in an introductory improvement in one mental function upon the efficiency of
chemistry course, J. Chem. Educ., 80, 899–903. other functions. ii. The estimation of magnitudes, Psychol.
Published on 12 July 2019. Downloaded on 7/17/2019 7:22:08 AM.

Ontario University, (2017), Appendix 1: OCAV’s undergraduate Rev., 8(4), 384–395.


and graduate degree level expectations, retrieved from http:// Thorndike E. L. and Woodworth R. S., (1901c), The influence of
oucqa.ca/framework/appendix-1/, accessed on 09/10/2017. improvement in one mental function upon the efficiency of
Pallant J. F., (2016), SPSS survival manual, 6th edn, Sydney: other functions: functions involving attention, observation
Allen & Unwin. and discrimination, Psychol. Rev., 8(6), 553–564.
Pascarella E., (1999), The development of critical thinking: Tiruneh D. T., Verburgh A. and Elen J., (2014), Effectiveness of
Does college make a difference? J. Coll. Stud. Dev., 40(5), critical thinking instruction in higher education: a systematic
562–569. review of intervention studies, High. Educ. Stud., 4(1), 1–17.
Pearson, (2015), Watson-Glaser critical thinking appraisal – short Tiruneh D. T., De Cock M., Weldeslassie A. G., Elen J. and
form (WGCTA-S), retrieved from https://www.pearsonclinical. Janssen R., (2016), Measuring critical thinking in physics:
com.au/products/view/208, accessed on 03/07/2015. development and validation of a critical thinking test in
Phillips V. and Bond C., (2004), Undergraduates’ experiences of electricity and magnetism, Int. J. Sci. Math. Educ., 1–20.
critical thinking, High. Educ. Res. Dev., 23(3), 277–294. Tsai C. C., (2001), A review and discussion of epistemological
Pithers R. T. and Soden R., (2000), Critical thinking in educa- commitments, metacognition, and critical thinking with
tion: a review, Educ. Res., 42(3), 237–249. suggestions on their enhancement in internet-assisted
Preiss D. D., Castillo J., Flotts P. and San Martin E., (2013), chemistry classrooms, J. Chem. Educ., 78(7), 970–974.
Assessment of argumentative writing and critical thinking in University of Adelaide, (2015), University of Adelaide graduate
higher education: educational correlates and gender differ- attributes, retrieved from http://www.adelaide.edu.au/learn
ences, Learn. Individ. Differ., 28, 193–203. ing/strategy/gradattributes/, accessed on 15/04/2015.
Prinsley R. and Baranyai K., (2015), STEM skills in the workforce: University of Edinburgh, (2017), The University of Edinburgh’s
What do employers want? retrieved from http://www.chief graduate attributes, retrieved from http://www.ed.ac.uk/employ
scientist.gov.au/wp-content/uploads/OPS09_02Mar2015_Web. ability/graduate-attributes/framework, accessed on 09/10/2017.
pdf, accessed on 06/10/2015. University of Melbourne, (2015), Handbook – chemistry, retrieved
Richardson M., Abraham C. and Bond R., (2012), Psychological from https://handbook.unimelb.edu.au/view/2015/!R01-AA-
correlates of university students’ academic performance: a MAJ%2B1007, accessed on 15/04/2015.
systematic review and meta-analysis, Psychol. Bull., 138(2), Watson G. and Glaser E. M., (2006), Watson-Glaser critical thinking
353–387. appraisal short form manual, San Antonio, TX: Pearson.

This journal is © The Royal Society of Chemistry 2019 Chem. Educ. Res. Pract.

You might also like