You are on page 1of 14

Innovations in Education and Teaching International

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/riie20

Multiple-choice questions (MCQs) for higher-order


cognition: Perspectives of university teachers

Qian Liu, Navé Wald, Chandima Daskon & Tony Harland

To cite this article: Qian Liu, Navé Wald, Chandima Daskon & Tony Harland (08 Jun
2023): Multiple-choice questions (MCQs) for higher-order cognition: Perspectives
of university teachers, Innovations in Education and Teaching International, DOI:
10.1080/14703297.2023.2222715

To link to this article: https://doi.org/10.1080/14703297.2023.2222715

© 2023 The Author(s). Published by Informa


UK Limited, trading as Taylor & Francis
Group.

Published online: 08 Jun 2023.

Submit your article to this journal

Article views: 1710

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=riie20
INNOVATIONS IN EDUCATION AND TEACHING INTERNATIONAL
https://doi.org/10.1080/14703297.2023.2222715

Multiple-choice questions (MCQs) for higher-order cognition:


Perspectives of university teachers
Qian Liu , Navé Wald , Chandima Daskon and Tony Harland
Higher Education Development Centre, University of Otago, Dunedin, New Zealand

ABSTRACT KEYWORDS
This qualitative study looks at multiple-choice questions (MCQs) in Multiple-choice questions
examinations and their effectiveness in testing higher-order cogni­ (MCQs); Bloom’s taxonomy;
tion. While there are claims that MCQs can do this, we consider many higher-order cognition;
difficulty
assertions problematic because of the difficulty in interpreting what
higher-order cognition consists of and whether or not assessment
tasks genuinely lead to specific outcomes. We interviewed university
teachers from different disciplines to explore the extent to which
MCQs can assess higher-order cognition specified in Bloom’s taxon­
omy. The study showed that study participants believed MCQs can
test higher-order cognition but most likely only at levels of ‘apply’
and ‘analyse’. Using MCQs was often driven by the practicality of
assessing large classes and by a need for comparing students’ per­
formances. MCQs also had a powerful effect on curriculum due to the
careful alignment between teaching and assessment, which makes
changes to teaching difficult. These findings have implications for
both teaching and how higher education is managed

Introduction
The assessment method used in examination matters. It influences how students learn
and prepare for examination and thus has important educational implications in terms of
the skills and knowledge they acquire and develop (Harland, 2020; Scouller, 1998).
Multiple-choice questions (MCQs) are a notable and arguably increasingly popular
method of assessment in higher education (Malau-Aduli et al., 2014). MCQs originated
in the US early in the twentieth century, and were favoured for their versatility, cost
effectiveness, and precision of measurement. They were subsequently adopted by many
countries worldwide, and became fiercely critiqued and debated, especially during the
1990s (Williams, 2006), but essentially to this day.
There are several aspects to the current debates about what MCQs are good at
assessing and how this impacts student learning processes and outcomes. There seems
to be little doubt about the efficacy of MCQs in assessing factual knowledge and recall of
information but there are increasing claims that MCQs can do much more (Douglas et al.,
2012). Less common is consistent empirical evidence that supports such claims. In this

CONTACT Qian Liu qian.liu@otago.ac.nz Higher Education Development Centre, University of Otago, 65 Union
Place West, Dunedin 9016, New Zealand
© 2023 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License
(http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any med­
ium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way. The terms on which this article
has been published allow the posting of the Accepted Manuscript in a repository by the author(s) or with their consent.
2 Q. LIU ET AL.

paper, we are interested in determining how effective university teachers think MCQs are
at assessing higher-order cognition. In particular, we draw on Bloom’s taxonomy to
explore the extent to which MCQs can assess higher-order cognition.

Mcqs for higher-order cognition


Higher-order cognition generally refers to the complex cognitive skills that ‘involve
uncertainty, application of multiple criteria, reflection and self-regulation’ (Barak et al.,
2007, p. 355). One approach to classifying cognitive skills has been Bloom’s taxonomy
(Bloom et al., 1956), which has been widely used in higher education to depict educa­
tional objectives (Stringer et al., 2021), most notably by medical educators (Monrad et al.,
2021). According to the taxonomy (Krathwohl, 2002), cognitive skills, including ‘remem­
ber’, ‘understand’, ‘apply’, ‘analyse’, ‘evaluate’ and create’, are assumed to follow
a cumulative hierarchical order (Figure 1), and higher-order cognition typically corre­
sponds to the cognitive skills above the level of ‘understand’ (e.g. Barak et al., 2007).
Research has shown that MCQs can efficiently test recall of factual knowledge and so
are much more prevalent in the sciences and in the early years of a university education
where vast amounts of information need to be mastered. However, it is also clear that
MCQ researchers do not value this focus as an end in itself, and there is a drive to make
MCQs ‘better’ and test higher-order cognition (Jensen et al., 2014; Kıyak et al., 2022).
Several studies have investigated whether MCQs can assess higher-order cognition.
Zheng et al. (2008), for example, used Bloom’s taxonomy to examine whether medical and
biology education hinders the development of higher-order cognition by overly focusing
on factual knowledge. The result concluded that the Medical College Admission Test
(MCAT), which was comprised only of MCQs, ‘fulfils its stated goal of assessing problem-
solving ability and critical thinking, in addition to mastery of basic biology concepts’
(Zheng et al., 2008, p. 414). However, their analysis suggested that the MCAT and first-year

Figure 1. Revised Bloom’s taxonomy by Krathwohl (2002).


INNOVATIONS IN EDUCATION AND TEACHING INTERNATIONAL 3

medical school examination excluded the top two higher levels of cognition in Bloom’s
taxonomy (Figure 1), leaving only ‘application’ and ‘analysis’ being assessed by MCQs.
Whether or not this level of higher-order cognition and the corresponding critical thinking
this entails are sufficient is open to debate.
Tractenberg et al. (2013) applied a cognitive complexity matrix based on Bloom’s
taxonomy to evaluate three tests consisting of 252 MCQs from a graduate-level physiol­
ogy course. They found that nearly 70% of questions were at the lower-two levels
(‘remember and ‘understand’). Only about three per cent of questions were at the level
of ‘analyse’ and no questions were at the top two levels, probably since the ‘highest levels
of complexity require that students generate some response, (which is) incompatible with
multiple choice questions’ (Tractenberg et al., 2013, p. 953).
Another study by Kim et al. (2012) further demonstrated the challenges in assessing
higher-order cognition through MCQs. The study reported that 13% of MCQs (n = 638)
used in a pharmaceutical course over several years were at the highest two levels of
cognition and 55% at the lowest two levels. It is worth noting that the number of
questions classified at the highest levels (n = 84) in this study is not consistent with
most other studies that found very few questions at those levels if any (e.g. Jensen
et al., 2014). This difference reflects a deeper issue around the subjectivity of mapping
MCQs to different cognitive skills identified in Bloom’s taxonomy. In more recent studies
(Thompson & O’Loughlin, 2015; Zaidi et al., 2018), researchers have sought to address the
issue of subjectivity by dichotomising the taxonomy into lower-order cognition (bottom
two levels) and higher-order cognition. Interestingly, while Zaidi et al. (2018) include the
four top levels in the higher-order category, Thompson and O’Loughlin (2015) only
include the middle two levels since they argue the top two levels cannot be assessed
using MCQs. However, there were still diverging views on whether certain MCQs assessed
lower or higher-order cognition (Monrad et al., 2021).
In addition to challenges in using MCQs for higher-order cognition, the use of MCQs as
an assessment tool has often been driven by practical concerns rather than pedagogical
considerations, such as assessing levels of cognition (Nicol, 2007). Large class sizes
resulting from the massification of higher education are the most commonly used
justification for using MCQs, even though their adoption precedes this phenomenon
(e.g. Veloski et al., 1999). It is clear that it is no longer possible to assess students in
other ways when they are taught in large classes, due to both the frequency of assess­
ments necessary and the scale of the enterprise. MCQs may require some initial invest­
ment of time and effort for developing a bank of questions, but subsequently require
substantially less resources than other assessments, especially when the tests are marked
by computers.
Furthermore, in examinations that use MCQs, there are typically a range of
questions based on level of difficulty. This range may be directly or indirectly
influenced by norm-referencing so that the cohort of students will sit somewhere
on a normal bell curve after an exam that allows for differentiation and a reasonable
pass/fail rate (e.g. George et al., 2006). A student may be able to pass the examina­
tion without success with the more difficult questions and so miss out on potential
higher-order learning processes. Such a strategy is more difficult in other forms of
assessment that are known to require higher-order learning as a measured outcome
(see Scouller, 1998). A further issue is the link between cognitive processes and how
4 Q. LIU ET AL.

hard an MCQ question is to answer. Tractenberg et al. (2013) made a distinction


between a question’s level of cognitive complexity and difficulty. The level of
difficulty was estimated using the Rasch model, which depicts the relationship
between a person’s ability and an item’s perceived (by experts) difficulty. They
found that a question’s difficulty was independent of its cognitive complexity. It
seems that a question can be at a higher level of Bloom’s taxonomy and quite easy
to answer.

Research aim
The debate over the strengths and weaknesses of MCQs as an assessment tool is neither
new nor resolved. the relationship of MCQs and higher-order cognition is one aspect of
this ongoing debate. Bloom’s taxonomy has been deemed in the literature a useful tool
for examining the ability of MCQs to assess higher-order cognition. This literature, how­
ever, tends to be quantitative and focus on analysing MCQs and mapping those over
Bloom’s taxonomy. However, this common approach has generated different findings and
conclusions, leaving conceptual and pedagogical issues pending. Considering this, we
sought a different, qualitative approach in order to gain new insights from those who use
MCQs.
In the present study, we seek to understand how university teachers think their MCQs
test students’ higher-order cognition. In particular, we are interested in an MCQ’s ability to
assess levels of cognition identified by Bloom’s taxonomy, the extent to which this
assessment tool can do this, and how meaningful such a test is. We acknowledge that
a university teacher’s perspective of what MCQs can assess is inherently subjective and
accept the reported challenges associated with asking teachers to identify levels of
cognition and difficulty an MCQ is designed for (Thompson & O’Loughlin, 2015).
Nevertheless, in the present study, such subjectivity is considered valuable as it reflects
the intentionality of teachers who are often question writers and assessors that assess
students cognitive development through the MCQs they created.

Methods
This study was conducted at a research-intensive university in New Zealand from August
to December 2021. The higher education sector in New Zealand is framed and guided by
the Education and Training Act 2020. The Act put forward a societal expectation that
universities in New Zealand should, inter alia, assume the role of critic and conscious of
society and aim to develop intellectual independence. The latter, while open to inter­
pretation, is seen to be manifested in graduates having a disposition towards critical
thinking and being guided in thought and in action by their own thinking (Shephard,
2022). Teaching and assessing such critical thinking, Shephard (2022) notes, is compatible
with the cognitive objectives in Bloom’s taxonomy.
The university offers a range of undergraduate programmes broadly covering sciences,
health sciences, humanities and commerce. Across the undergraduate programmes,
MCQs are commonly used for learning (revision) and summative assessment, in particular
as part of mid-term tests or final examinations. Here we are only interested in MCQs used
in summative assessment and grading.
INNOVATIONS IN EDUCATION AND TEACHING INTERNATIONAL 5

We conducted semi-structured interviews and the participants were teachers who


designed or used MCQs at an undergraduate level. After gaining institutional ethics
approval, we identified potential participants based on their reputation in using
MCQs, their online course outlines that clearly indicated the use of MCQs in sum­
mative tests and examinations, or through referrals during the interview. We sent
out 40 email invitations to participation in an interview and eleven volunteered. The
participants held teaching-focused positions, lecturer or associate professor roles.
They taught in geography, marketing, zoology, medicine, anatomy, chemistry, phar­
macy, physiology, biochemistry and biostatistics from mostly the first year level
courses to the third year and honour’s levels. We did not collect data on how
MCQs used in summative tests or examinations were organised in relation to all
the other forms of assessment.
In the invitation, we provided the study rationale and an interview protocol. We also
requested that participants bring one set of MCQs so that these could be analysed during
the interview. We used a semi-structured interview technique to capture how participants
used MCQs and their perceptions of them in relation to assessing higher-order cognition.
Each interview was around 60 minutes in length and was recorded and transcribed.
The interview protocol defined higher-order cognition according to the revised
Bloom’s taxonomy (Figure 1), which included ‘apply’, ‘analyse’, ‘evaluate’ and ‘create’.
The protocol consisted of five broad questions, which are identified below. The questions
were supplemented with inquiries based on participant responses. Data analysis followed
the general inductive approach (Thomas, 2006). Interview transcripts were read by each
researcher independently and emergent research themes agreed. The results are reported
with respect to the research aim and discussed in relation to relevant research.

(a) Could you please share the assessment structure within the course/paper you
teach or design?
(b) Who designed the MCQs?
(c) MCQs (like all assessments) are likely to have a range of impacts on learning:
(i) Can you help us to identify what is being assessed in the MCQ activities? Here is
a diagram representation of the revised Bloom’s taxonomy (Figure 1)? Which
area of Bloom’s taxonomy do you think is assessed in each question of your
MCQs?
(ii) For one question you think corresponds with the highest order category, can
you rate its difficulty (low 1-6 high)?
(d) What other approaches have you experienced for assessing higher-order thinking
skills?
(e) Is there anything else you’d like to comment on regarding the use of MCQs?

Findings and discussion


Overall, our analysis suggested that university teachers: (1) considered MCQs being able
to assess limited levels of cognition; (2) adopted MCQs mainly out of practical rather than
pedagogical concerns; and (3) designed MCQs to differentiate learner performance in
addition to assessing cognitive development.
6 Q. LIU ET AL.

MCQs served to assess limited levels of cognition


The majority of MCQs in a particular test targeted lower-order cognition (‘remember’ and
‘understand’), which may partially be explained by the introductory level of most of the
examinations presented to us. However, all tests also included questions that were
deemed higher-order, mostly at the ‘apply’ level and more rarely at the ‘analyse’ level.
Having a mix of question levels was generally deemed important and overall it was clear
that only small parts of each test were aimed at higher-order cognition.
At the same time, interview discussions showed that assigning a question to a level in
the taxonomy was not exact and decisions usually required careful debate about what the
terms in Bloom’s taxonomy actually meant to the discipline. Therefore, this is a concern
with such a correlation approach to question writing and claims about what they can
achieve should be viewed with caution, as what counts as the highest level of cognition
for one teacher might not be the same for another. As such, claims that MCQs can teach
critical thinking, for instance, still require careful examination. For example, Kim et al.
(2012) claim that ‘application’ is ‘use a concept in a patient case’ and that ‘synthesis’ is
‘apply multiple factors in a patient case’ (p. 2). It could also be argued that both are
examples of ‘application’ (see Monrad et al., 2021).
In one examination, the higher-order questions were all constructed around the
assertion-reason technique (Williams, 2006) in which complex statements need to be
understood as true or false and then a judgement made about their causal relationship.
while the A, it means that both are correct, and causally related. Whereas B, and these are the
most difficult ones, means the first statement is correct and the second statement is correct.
But they are just independent statements. They don’t have a relation to each other.
(Participant 11)

These assertion-reason questions were thought likely to be at the analysis level of the
taxonomy, but that this would depend on students’ ability to think in an abstract way.
Williams tested the usefulness of this type of question for higher-order cognition, but his
claims were quite modest in that any question that required reasoning was better than
something that required recall. Overall, study participants thought that MCQs were
predominantly used for lower-order cognition but could reach the lower levels of higher-
order cognition (‘apply’ and ‘analyse’). After careful consideration, most did not think
MCQs were capable of assessing the highest categories in Bloom’s taxonomy (‘evaluate’ or
‘create’).
There was some evidence that what was assessed was carefully aligned to what was
taught and this was partly because of a perception there was a required pathway to
knowledge acquisition (although this pathway excluded the top two levels of the taxon­
omy). As such, adopting the MCQ format may restrict teaching because of beliefs that it is
necessary to first test knowledge requiring lower-order cognitive tasks. However, several
participants suggested that it might be possible to write questions for the higher levels but
these would need to be carefully written and would have a more complex construction.
And to make it an easier applied question, we gave them values that we had used in lab. To
make it a harder applied question, we could have given them values that were unfamiliar to
them. And then they would have been strictly on their own in terms of trying to, to put those
pieces together so that they can do the calculation without any triggers with familiar
numbers. (Participant 8)
INNOVATIONS IN EDUCATION AND TEACHING INTERNATIONAL 7

The amount of time spent on higher-order learning is also important as these skills require
practice. Participant 8 said:
the average answer time for the MCQs was about one minute . . . but this one [a question
aimed at higher-order cognition], I think most students spend about five minutes on it simply
because they had to do so much thinking.

MCQs served to address practical assessment challenges


A few participants talked about recent changes that led to assessment being exclusively
MCQs. The reasons were predominantly about large and increasing class sizes and the
heavy marking-loads that other forms of assessment, such as open-ended questions and
short answers, required. One participant added that MCQs are ‘good for saving time’ on
assessment but they also allow inexperienced assessors (e.g. students and postgraduates)
to teach others and grade them. There was also one report of a course that shifted totally
to MCQs to cope with the COVID−19 pandemic:
So they just thought we’ll just do everything MCQs this year and then go back to normal. But
then they [teachers] thought this is nice and convenient. And performance was roughly the
same in the exam, so they just decided to keep it. (Participant 9)

Overall, there was a general concern that while MCQs have merit, they were a practical
choice rather than an ideal option. Adopting them as an assessment method, however,
influenced the curriculum in terms of what knowledge could be tested and what could be
taught, and it was suggested that ‘MCQs are great if you understand what you can and
can’t ask’ (Participant 10). Courses were mostly taught by a team of academics and thus
changes to assessment procedures were typically carefully discussed, but once
embedded, substantial change became more difficult. Academic debate was an integral
part of using MCQs, and teachers spent much time on the complex challenges of question
construction and refinement. This process was seen as inherently positive, sometimes
frustrating but also potentially developmental for teachers.
There was some indication that assessment discussions went beyond a specific course
to include a more encompassing evaluation of a degree programme. Participants thought
that students should encounter a range of assessment methods during their time at
university. However, instead of having different forms of assessment within the same
course that might complement each other (in terms of the knowledge and skills assessed),
a majority seemed to accept that it was adequate for students to encounter only MCQs
early on in their study if, later on, other forms of assessment were used. This reflects
a wider programme approach to assessment, which is positive because it requires com­
munication and planning across courses. However, only one respondent described this
process in detail and showed how MCQs played a part in what they wanted students to
achieve at university. None talked about the balance between assessment of and for
learning, although some talked of giving feedback in the context of returning the correct
test answers to the students.
The common practice in team-taught courses was that each lecturer provided the
questions for the materials they covered. These were then collated and typically quality
assured by the course convenor. But departments had a range of different processes for
8 Q. LIU ET AL.

quality assurance and pre- and post-testing of questions, reflecting different levels of
rigour or allocation of resources towards the process. Study participants also had different
views of and values towards MCQs. Some had researched MCQs (one had a significant
expertise on this topic and had published in the area) and were aware of wider practices
such as the use of common language ‘tricks’ to make questions more difficult or asking
what the wrong answer is. Several participants expressed a wish to learn how to write
better questions, particularly at the higher-order cognition level.
In many introductory courses – especially in the sciences and health sciences –
students were mainly assessed by MCQs and they had become proficient in seeking
MCQ cues (see Downing, 2002) and patterns. In one case, however, they were asked to
write a short essay:

They perform terribly . . . partly because of writing ability, but also partly because they’re not
very good at interpreting the questions. And actually, for instance, like if we ask them
a question about the cardiovascular system, they say ‘Okay, this is about the cardiovascular
system, I’m going to write everything that I know about the heart and the vasculature’,
whereas they don’t actually address the key points that are asked of them. (Participant 4)

Here we see students experiencing difficulty with, for them, a relatively novel form of
assessment that requires a different approach to learning and examination. It also high­
lights the general danger of using only one type of assessment tool. Stanger-Hall and
Chudler (2012) reached two related conclusions in her study: 1) the MCQ-only format is
detrimental to critical thinking and 2) while a mixed format of MCQs and short answers
requires additional resources for grading, it is worthwhile doing so for improving critical
thinking skills.
Another participant reported that because grades from an equivalent MCQ and short
answer test were the same, this was seen as justification for replacing the short answer
test with MCQs. However, another compared short-answer examinations with MCQs and
claimed that although both reached the ‘analyse’ level of Bloom’s taxonomy, students
learned more from the essay task. Essays also allow for feedback, which is part of higher-
order cognition and an integral part of assessing complex knowledge (Harland & Wald,
2021; Scouller, 1998). In MCQs, any attempt to deal with feedback was seen as proble­
matic. For example, one respondent claimed that if the test itself was returned to
students, there was a risk that it would be passed on or even sold to others taking the
examination the following year.
When MCQs are equated with better learning outcomes (Williams, 2006) this
view is about the assessment tool itself and measurement of a student’s knowl­
edge and ability, but not about the student as a contributor to that knowledge
and their ability to do so. Such an idea is unlikely to be controversial when applied
to essay or report writing, which are complex tasks that require sufficient time for
research, evaluation and synthesis of findings and careful writing. It is not, how­
ever, clear how MCQs might generate – rather than assess or measure – any such
learning outcomes. One possibility might be that this kind of assessment requires
some preparation that would generate higher-order outcomes, which emphasises
the notion of assessment of learning and assessment for learning, and the idea
that some assessment formats promote student learning while others are more
a tool for measurement and accreditation (Wiliam, 2011). Nicol (2007), for instance,
INNOVATIONS IN EDUCATION AND TEACHING INTERNATIONAL 9

has raised concerns about the ability of MCQs to involve students more actively in
assessment. If MCQs are used for revision exercises, they are likely to impact
differently on learning to those used for examinations. In the latter case it seems
that MCQs predominantly assess what has been learned, and when they do
promote the learning process, it is within narrow parameters with respect to
memorisation.

MCQs served to differentiate learner performance


Even if an MCQ could reach the highest level of Bloom’s taxonomy, it appears that all tests
aimed for a range of difficulty, typically expressed as ‘easy, medium and hard questions’:
they usually have a couple of easy ones, and then maybe a couple of medium and then one
hard or something like that for each module. (Participant 4)

It was important to maintain the mix of questions for consistency of difficulty level
between cohorts and also for when students needed to re-sit an exam with new
questions:
we need to make sure that we’re replacing similar questions, or at least what we think are
similar questions. And we don’t replace, like an easy question with a hard one or something
like that. (Participant 4)

As noted above, the level of difficulty could be established as the relationship between
a student’s ability and the perceived level of difficulty a question entails, according to
subject experts (Tractenberg et al., 2013). To examine the relationship between difficulty
and cognition we asked participants to judge the difficulty of the one question in their
test that they thought offered the highest level on Bloom’s taxonomy (in all cases,
‘analyse’). Most of the questions were not seen as particularly difficult with an average
score of 3.75 (Range 3–6, Scale of 1–6 with 6 being most difficult). As such, higher-order
questions may not necessarily challenge students.
In contrast, participants suggested that lower-order questions can be hard to answer.
In one example, students had to recall facts from two different parts of the course, and
they struggled to do this. A second example involved increasing question complexity by
simply changing the question to contain more or different words while testing the same
knowledge. This is a well-known problem for students whose first language in not the test
language; however, reading ability and strategies are linked to cognition level (Veeravagu
et al., 2010). Other participants noted the importance of exam preparation skills, saying
that if students were prepared for answering complex or higher-order questions (e.g. by
working through examples, test rehearsal or writing their own practice questions), they
found it much easier in the examination (see Karpen & Welch, 2016).
In addition, interview data suggested that ‘difficulty’ was a subjective concept:
I would classify a question one way. And my colleague would classify it a different way. And so
we find that we have to have multiple people looking to try and come to an agreement.
(Participant 2)

Similarly, such judgements can be subjective for students. Stringer et al. (2021) looked at
student perceptions of cognitive difficulty and found that first-year medical students who
10 Q. LIU ET AL.

answered an MCQ incorrectly tended to perceive it to be of higher-order, regardless of


how teachers designed it. These authors also found that high-performing students sought
cues for higher-order questions while lower-performing students engaged in analytical
thinking, even when the question was of a lower-order. Zaidi et al. (2018) suggest that
students’ approaches should be considered in the debate about the relationship between
MCQs and Bloom’s taxonomy. These authors claim that one student could use the two
middle levels of Bloom's taxonomy to answer a factual question, while another may use
a different approach, such as recall. Such a claim remains to be tested but there is some
evidence to support this general idea (Huxham & Naeraa, 1980).
What was clear is that difficulty was a major concern and question effectiveness was
judged through an analysis of past and present examination data, taking into account
expectations of who should have answered questions correctly by tracking individual
performance over time. This was a form of norm-referencing in a test where student pass-
rate data are expected to fit a normal curve, and such an approach pre-determines that
not all students will achieve an A grade, even if they were all capable of demonstrating
this in other assessment circumstances. Under these conditions, one of the main purposes
of MCQs is to use relative standing to tell students apart in the norm-group.
Differentiation was seen as necessary when students were competing for programmes
that had selective entry, but the university assessment guidelines state that all assessment
should be criterion-referenced (using external criteria and not related to another’s per­
formance) and so this brought into question the appropriateness of MCQs for meeting
policy requirements, or the suitability of policy for the pragmatics and politics of differ­
entiating students through grades at university.
Overall, it appears that the relationship between difficulty, cognitive level and the
quality of thinking is extremely complex. Teachers who may wish to teach higher-order
cognition will nevertheless restrict themselves to what MCQs do well, thus reinforcing
beliefs about knowledge hierarchies.

Conclusion
The study showed that in relation to Bloom’s taxonomy, university teachers who use MCQs
believe these can be written to examine higher-order cognition but most likely only at the
first two of the higher-levels (‘apply’ and ‘analyse’, Figure 1, Zheng et al., 2008). In practice,
questions at these levels usually formed a small part of any test. The most important reason
given for using MCQs for examinations was economies of scale related to very large classes
and an implicit need for comparing student performances. In our study, MCQs were
predominantly used as the only form of assessment in science subjects in the first year
of university. Writing high quality questions was a careful and invested task and partici­
pants sought a mix of difficulty-level items so that students could be compared against
each other for teachers to be able to tell them apart. Training students in MCQ examina­
tions or question preparation was also believed to be of help in all test situations.
Importantly, not only was student learning and behaviour changed by MCQs but
also teacher behaviour. There appeared to be an effect of using this form of assess­
ment on both teaching and ease of curriculum change. For example, once MCQs were
created as a bank of questions, they framed the curriculum in such a way that they
rigidly determined the parameters of what was possible to teach and so teaching
INNOVATIONS IN EDUCATION AND TEACHING INTERNATIONAL 11

became aligned to the test. Any deviation from this would require investment in
question development and removal of those no longer relevant, and often a long
process of negotiation, discussion and testing. The end result would not necessarily be
seen as a problem because lecture material was aligned to a test that was able to
objectively differentiate student ability.
The relationship between question difficulty and cognitive level was not fully resolved
and this complex issue needs further work. Our analysis also raised some broader ques­
tions related to MCQs:

● Whether or not teachers would have preferred to teach more towards the higher
levels of the Bloom’s taxonomy at first year;
● What was teachers’ understanding of their broader obligations to graduate attri­
butes and a student’s overall education?

We know little about the relationship between the different skills in Bloom’s taxonomy and
although it has been assumed that those of a higher-order include lower-order, if and how this
occurs is uncertain. Any inquiry into this relationship will need to consider discipline, how the
course is taught and other context-related factors (Thompson & O’Loughlin, 2015). Higher-
order skills are typically developed in the context of learning a subject and then assessed as
complex knowledge with appropriate methods (Harland & Wald, 2021). Phillips et al. (2013),
for example, successfully mapped all of Bloom’s objectives to Radiology learning in an
examination that required short answers.
The discipline, particularly in sciences, is often understood as specific factual and theore­
tical subject knowledge. However, if critical thinking is seen as integral to the subject, then
subject thinking can be accepted as critical, but not as a skill that is either part of the subject or
something separately applied to it. Then the facts and theories are learned within a critical
context, such as research inquiry, and the student learns to think like a radiologist, clinician,
physicist, and so on. This conception of education requires different approaches to teaching
that are often reflected in the later years of a student’s university education. It will be able to
meet the highest levels of Bloom’s taxonomy, but this concept of learning is very unlikely to
be assessed meaningfully by MCQs.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Notes on contributors
Qian Liu is a Lecturer at the Higher Education Development Centre. His research revolves around
learning and teaching in higher education.
Navé Wald is a Lecturer at the Higher Education Development Centre. His recent research focuses
on critical thinking in assessment practice and doctoral co-supervision. His teaching interests
include supporting those new to research in higher education, helping students at all levels to
develop their critical skills, and promoting assessment literacy.
12 Q. LIU ET AL.

Chandima Daskon is a Research Fellow at the Higher Education Development Centre, University of
Otago. She is a multi-disciplinary researcher with particular interests in development studies and
qualitative research methodologies.
Tony Harland is Professor of Higher Education. His recent research examined how undergraduates
learn through doing research, how assessment affects student behaviour and the quality of their
education, and research methods in higher education. Tony teaches research methods in higher
education and other topics such as learning theory and peer review.

ORCID
Qian Liu http://orcid.org/0000-0002-1412-0615
Navé Wald http://orcid.org/0000-0002-0038-9322
Chandima Daskon http://orcid.org/0000-0003-1562-9701
Tony Harland http://orcid.org/0000-0002-0381-9949

References
Barak, M., Ben-Chaim, D., & Zoller, U. (2007). Purposely teaching for the promotion of higher-order
thinking skills: A case of critical thinking. Research in Science Education, 37(4), 353–369. https://doi.
org/10.1007/s11165-006-9029-2
Bloom, B. S., Engelhart, M. D., Furst, E., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of educational
objectives: The classification of educational goals. Handbook 1: Cognitive domain. David McKay.
Douglas, M., Wilson, J., & Ennis, S. (2012). Multiple-choice question tests: A convenient, flexible and
effective learning tool? A case study. Innovations in Education and Teaching International, 49(2),
111–121. https://doi.org/10.1080/14703297.2012.677596
Downing, S. M. (2002). Threats to the validity of locally developed multiple-choice tests in medical
education: Construct-irrelevant variance and construct underrepresentation. Advances in Health
Sciences Education, 7(3), 235–241. https://doi.org/10.1023/A:1021112514626
George, S., Haque, M. S., & Oyebode, F. (2006). Standard setting: Comparison of two methods. BMC
Medical Education, 6(1), 1–6. https://doi.org/10.1186/1472-6920-6-46
Harland, T. (2020). University challenge: Critical issues for teaching and learning. Routledge.
Harland, T., & Wald, N. (2021). The assessment arms race and the evolution of a university’s
assessment practices. Assessment & Evaluation in Higher Education, 46(1), 105–117. https://doi.
org/10.1080/02602938.2020.1745753
Huxham, G. J., & Naeraa, N. (1980). Is Bloom’s Taxonomy reflected in the response pattern to MCQ
items? Medical Education, 14(1), 23–26. https://doi.org/10.1111/j.1365-2923.1980.tb02608.x
Jensen, J. L., McDaniel, M. A., Woodard, S. M., & Kummer, T. A. (2014). Teaching to the test . . . or
testing to teach: Exams requiring higher order thinking skills encourage greater conceptual
understanding. Educational Psychology Review, 26(2), 307–329. https://doi.org/10.1007/s10648-
013-9248-9
Karpen, S. C., & Welch, A. C. (2016). Assessing the inter-rater reliability and accuracy of pharmacy
faculty’s Bloom’s Taxonomy classifications. Currents in Pharmacy Teaching & Learning, 8(6),
885–888. https://doi.org/10.1016/j.cptl.2016.08.003
Kim, M.-K., Patel, R. A., Uchizono, J. A., & Beck, L. (2012). Incorporation of Bloom’s taxonomy into
multiple-choice examination questions for a pharmacotherapeutics course. American Journal of
Pharmaceutical Education, 76(6), 114. https://doi.org/10.5688/ajpe766114
Kıyak, Y. S., Budakoğlu, I. İ., Bakan Kalaycıoğlu, D., Kula, S., & Coşkun, Ö. (2022). Can preclinical
students improve their clinical reasoning skills only by taking case-based online testlets?
A randomized controlled study. Innovations in Education and Teaching International, 60(3),
1–10. https://doi.org/10.1080/14703297.2022.2041458
Krathwohl, D. R. (2002). A revision of Bloom’s taxonomy: An overview. Theory into Practice, 41(4),
212–218. https://doi.org/10.1207/s15430421tip4104_2
INNOVATIONS IN EDUCATION AND TEACHING INTERNATIONAL 13

Malau-Aduli, B. S., Assenheimer, D., Choi-Lundberg, D., & Zimitat, C. (2014). Using computer-based
technology to improve feedback to staff and students on MCQ assessments. Innovations in
Education and Teaching International, 51(5), 510–522. https://doi.org/10.1080/14703297.2013.
796711
Monrad, S. U., Bibler Zaidi, N. L., Grob, K. L., Kurtz, J. B., Tai, A. W., Hortsch, M., Gruppen, L. D., &
Santen, S. A. (2021). What faculty write versus what students see? Perspectives on multiple-choice
questions using Bloom’s taxonomy. Medical Teacher, 43(5), 575–582. https://doi.org/10.1080/
0142159X.2021.1879376
Nicol, D. (2007). E‐assessment by design: Using multiple‐choice tests to good effect. Journal of
Further and Higher Education, 31(1), 53–64. https://doi.org/10.1080/03098770601167922
Phillips, A. W., Smith, S. G., & Straus, C. M. (2013). Driving deeper learning by assessment: An
adaptation of the revised Bloom’s taxonomy for medical imaging in gross anatomy. Academic
Radiology, 20(6), 784–789. https://doi.org/10.1016/j.acra.2013.02.001
Scouller, K. (1998). The influence of assessment method on students’ learning approaches: Multiple
choice question examination versus assignment essay. Higher Education, 35(4), 453–472. https://
doi.org/10.1023/A:1003196224280
Shephard, K. (2022). On intellectual independence: The principal aim of universities in New Zealand.
New Zealand Journal of Educational Studies, 57(1), 269–284. https://doi.org/10.1007/s40841-022-
00250-7
Stanger-Hall, K. F., & Chudler, E. H. (2012). Multiple-choice exams: An obstacle for higher-level
thinking in introductory science classes. CBE—Life Sciences Education, 11(3), 294–306. https://
doi.org/10.1187/cbe.11-11-0100
Stringer, J. K., Santen, S. A., Lee, E., Rawls, M., Bailey, J., Richards, A., Perera, R. A., & Biskobing, D.
(2021). Examining Bloom’s taxonomy in multiple choice questions: Students’ approach to ques­
tions. Medical Science Educator, 31(4), 1311–1317. https://doi.org/10.1007/s40670-021-01305-y
Thomas, D. R. (2006). A general inductive approach for analyzing qualitative evaluation data.
American Journal of Evaluation, 27(2), 237–246. https://doi.org/10.1177/1098214005283748
Thompson, A. R., & O’Loughlin, V. D. (2015). The Blooming Anatomy Tool (BAT): A discipline-specific
rubric for utilizing Bloom’s taxonomy in the design and evaluation of assessments in the
anatomical sciences. Anatomical Sciences Education, 8(6), 493–501. https://doi.org/10.1002/ase.
1507
Tractenberg, R. E., Gushta, M. M., Mulroney, S. E., & Weissinger, P. A. (2013). Multiple choice questions
can be designed or revised to challenge learners’ critical thinking. Advances in Health Sciences
Education, 18(5), 945–961. https://doi.org/10.1007/s10459-012-9434-4
Veeravagu, J. V. J., Muthusamy, C., Marimuthu, R., & Michael, A. S. (2010). Using Bloom’s taxonomy to
gauge students’ reading comprehension performance. Canadian Social Science, 6(3), 205–212.
https://doi.org/10.3968/j.css.1923669720100603.023
Veloski, J. J., Rabinowitz, H. K., Robeson, M. R., & Young, P. R. (1999). Patients don’t present with five
choices: An alternative to multiple-choice tests in assessing physicians’ competence. Academic
Medicine, 74(5), 539–546. https://doi.org/10.1097/00001888-199905000-00022
Wiliam, D. (2011). What is assessment for learning? Studies in Educational Evaluation, 37(1), 3–14.
https://doi.org/10.1016/j.stueduc.2011.03.001
Williams, J. B. (2006). Assertion‐reason multiple‐choice testing as a tool for deep learning:
A qualitative analysis. Assessment & Evaluation in Higher Education, 31(3), 287–301. https://doi.
org/10.1080/02602930500352857
Zaidi, N. L. B., Grob, K. L., Monrad, S. M., Kurtz, J. B., Tai, A., Ahmed, A. Z., Gruppen, L. D., & Santen, S. A.
(2018). Pushing critical thinking skills with multiple-choice questions: Does Bloom’s taxonomy
work? Academic Medicine, 93(6), 856–859. https://doi.org/10.1097/acm.0000000000002087
Zheng, A. Y., Lawhorn, J. K., Lumley, T., & Freeman, S. (2008). Application of Bloom’s taxonomy
debunks the``MCAT myth’’. Science, 319(5862), 414–415. https://doi.org/10.1126/science.1147852

You might also like