You are on page 1of 7

Applied Ergonomics 106 (2023) 103906

Contents lists available at ScienceDirect

Applied Ergonomics
journal homepage: www.elsevier.com/locate/apergo

Review article

Are two-person checks more effective than one-person checks for safety
critical tasks in high-consequence industries outside of healthcare? A
systematic review
Ryan D. McMullan a, *, Rachel Urwin a, Mark Wiggins b, Johanna I. Westbrook a
a
Australian Institute of Health Innovation, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia
b
Centre for Elite Performance, Expertise, and Training, Macquarie University, Sydney, Australia

A R T I C L E I N F O A B S T R A C T

Keywords: Double-checking has been used in high-consequence industries for decades. We aimed to determine the strength
Double-checking of the evidence-base regarding the effectiveness of double-checking which underpins its widespread adoption.
High-consequence industries We searched for quantitative studies of the effectiveness of two-person checking in industry sectors, excluding
Systematic review
healthcare. We performed a systematic literature search across six databases and hand-searched key journals. We
completed a narrative synthesis and quality assessment of the nine studies identified. Most studies were of fair
quality. Two examined the use of two-person checks in aviation, three investigated tasks in chemical
manufacturing, and four studies in psychology involved proofreading and visual search tasks. All studies found
that the performance of two-people checking was not superior to that of one-person in detecting errors. Further
research to compare the effectiveness of different checking processes along with factors which may support
optimisation of safety checks in high-consequence industries is required.

1. Introduction the threat of individual biases (e.g., confirmation bias) will be mini­
mised, such that two nurses are unlikely to make the same error at the
“Arm doors and cross-check” is a familiar refrain in the standard same time (ISMP, 2013).
operating procedures of modern airline travel. In essence, the procedure Double-checking or cross-checking is also used in other high-
requires flight attendants to check that their colleagues have correctly consequence industries as a process for building redundancy, a recog­
armed/disarmed airline doors to ensure that emergency escape slides nised component of high reliability organisations (Hofmann et al., 1995;
are operational/disengaged according to the configuration of the Lekka, 2011; Roberts, 1990). In multi-crew cockpits, the pilot moni­
aircraft. This principle of redundancy, often applied in engineering and toring will call out a checklist item that will be confirmed or corrected by
employed in settings as disparate as chemical plants and hospitals, is the pilot-flying. Similarly, in communicating with air traffic control, a
intended to reduce the likelihood of system failures and increase reli­ pilot will read-back a command that will be checked and confirmed by
ability by requiring that a number of people or processes independently the air traffic controller. Different operators at chemical plants monitor
perform the same function. and cross-check the same automated processes as a strategy to prevent
In healthcare, ‘double-checking’ has been a recommended and errors. This reflects inherent belief that, should one operator fail to
mandated practice in many clinical situations for decades. It is often identify an irregularity in the system, another operator will compensate
performed during the administration of medication due to the high for this failure by responding appropriately.
frequency of errors that occur at this stage of the drug treatment process The nature of the checking task can differ, such that checking can
(Keers et al., 2013). In this context, double-checking involves two nurses involve matching information (e.g., matching flight panel information
verifying the same information. To improve reliability, it has been rec­ against a checklist in an aircraft cockpit) (Pfeiffer et al., 2020; White
ommended that nurses complete ‘independent double checks’, where et al., 2010), drawing on knowledge and critically analysing information
they each independently verify the information (ISMP, 2003). This is (e.g., checking to determine the dose of a medication based upon a
based on the assumption that if these processes are undertaken correctly, child’s body weight) (Pfeiffer et al., 2020; White et al., 2010), or

* Corresponding author. Australian Institute of Health Innovation, Level 6, 75 Talavera Road, Macquarie University, NSW, 2109, Sydney, Australia.
E-mail address: ryan.mcmullan@mq.edu.au (R.D. McMullan).

https://doi.org/10.1016/j.apergo.2022.103906
Received 15 June 2022; Received in revised form 12 September 2022; Accepted 13 September 2022
Available online 20 September 2022
0003-6870/© 2022 Elsevier Ltd. All rights reserved.
R.D. McMullan et al. Applied Ergonomics 106 (2023) 103906

monitoring (e.g., monitoring a system that autonomously controls excluded. The same two authors examined full-text articles to judge
chemical processes) (Cymek, 2018). Effective checking behaviour re­ their eligibility in accordance with the inclusion criteria. Disagreement
quires the allocation of attentional resources to compare the features of or uncertainty between the two authors was resolved through discus­
the system to the features expected at the particular point in time. sion. Reference lists and citations of included articles were inspected for
However, this comparative process is cognitively demanding and where additional studies. The following key journals were hand-searched for
the process becomes routine, attentional resources may be withdrawn, studies to ensure the completeness of the search (Hopewell et al., 2007):
leading to checking errors. That is, the completion of routine tasks can Human Factors, Applied Ergonomics, Journal of Aviation/Aerospace Edu­
become ritualised, so they are not performed with the degree of atten­ cation and Research, and Safety Science.
tion necessary to identify deviations (Toft and Mascie-Taylor, 2005).
Using two-people in the checking process is intended to heighten 2.2. Inclusion criteria
performance, increasing the likelihood that an anomaly will be detected.
However, there is little good quality evidence to confirm whether this is Inclusion was contingent on the following: (a) an empirical quanti­
the case in medication administration (Koyama et al., 2020). Further, tative study (experimental, simulation, observational); (b) reported
there is limited research evidence regarding factors that improve or quantitative outcomes related to the effectiveness of two-person
hinder checking performance. Our systematic review of the effectiveness checking forthe detection of errors or system deviations; (c) the study
of double-checking to reduce medication administration errors identi­ was conducted outside of healthcare; (d) published in a peer-reviewed
fied a lack of evidence to support the process (Koyama et al., 2020). A journal; and (e) was written in English. Studies were required to meet
subsequently published, large empirical study of the effects of all of the criteria to be included.
double-checking on medication administration errors among 1523
paediatric inpatients found that independent double-checking was 2.3. Data extraction
rarely performed, even when mandated by hospital policy (Westbrook
et al., 2020). However, ‘primed double-checking’, where one nurse One author extracted data from the studies, comprising: first author;
shares information or primes a second nurse with information during the year of publication; country; sector; study design; participants; nature of
checking process, was highly prevalent. No significant association was task/s; outcome measures; and key findings. Key findings included re­
evident between primed, mandated double-checking and the rate or sults that related to the detection of errors, error rates, adherence to
severity of medication administration error. The results from this large checks, and detection of system failures when participants worked
study raise further questions about the possible safety value of individually versus in pairs. Extracted data were compiled in a table
double-checking and highlights gaps in our understanding of when and using Microsoft Excel (data available on request from the authors).
how double-checking may deliver the safety benefits anticipated.
The evidence-base for the effectiveness of double/cross-checking in 2.4. Quality assessment
industries outside healthcare has not been systematically reviewed.
Thus, the extent to which the safety benefits attributed to such processes One author assessed the quality of the studies by using a quality
is warranted is unknown. Given the extensive use of double-checking in assessment tool used
industries outside healthcare, an assessment of the extent and quality of widely to assess the methodological quality and risk of bias in
the evidence to support this process is important to both identify evi­ experimental studies (Hawker et al., 2002). This tool assesses studies on
dence gaps and potential recommendations for optimisation of checking each of nine categories: abstract and title; introduction and aims;
processes. method and data; sampling; data analysis; ethics and bias; results;
The objective of this systematic review was to determine the strength transferability and generalizability; and implications and usefulness.
and quality of evidence regarding the effectiveness of double-checking Each category is scored from good (4 points) to very poor (1 point) using
in safety critical industries outside healthcare including aviation, nu­ corresponding criteria (see Appendix B). A global quality grade is
clear power, engineering, and chemical manufacturing. We performed a defined based on the total score: good (28–36 points), fair (19–27
systematic search for quantitative evidence to address the following points), poor (10–18) and very poor (1–9 points).
research question: Is two-person checking more effective than one-
person checking in detecting errors and system variations in high- 2.5. Data synthesis
consequence industries outside of healthcare?
Due to the heterogeneity of the design and outcome measures of the
2. Method included studies, we completed a narrative synthesis of the studies.

This systematic review was conducted in accordance with the 3. Results


Preferred Reporting Items for Systematic Reviews and Meta-Analyses
(PRISMA) guideline (Page et al., 2021). The protocol for our system­ 3.1. Results and study characteristics
atic review was not pre-registered.
The literature search retrieved 1661 articles. After title and abstract
2.1. Search strategy screening, 1651 articles were excluded. One article was excluded after
full-text review. A total of nine studies were included in the qualitative
We performed a search of PsycINFO, Web of Science, Advanced synthesis (Fig. 1). Study characteristics are outlined in Table 1.
Technologies and Aerospace, Health & Safety Science Abstracts, Google Five studies were conducted in the United States of America (USA)
Scholar, and Scopus databases for scientific literature published from (Brown, 1963; Brown and Fox, 1965; Contte and Jacobs, 1997; Mosier
database inception to April 12, 2022. The search strategy (see Appendix et al., 2001; Skitka et al., 2000), three in Germany (Cymek, 2018; Cymek
A) was developed with the assistance of a research librarian and et al., 2016; Domeinski et al., 2007), and one in Japan (Nihei et al.,
included categories for checking, error, and relevant industries and 2002). All of the studies were experiments. The year of publication
disciplines. Synonyms within these categories were linked with the ranged from 1963 to 2018. Most studies were of fair quality (see Ap­
Boolean operator “OR” and linked across categories using “AND”. The pendix B) (Brown, 1963; Brown and Fox, 1965; Cymek et al., 2016;
results of the search were merged using reference management software Domeinski et al., 2007; Mosier et al., 2001; Nihei et al., 2002; Skitka
(Endnote 20). After the removal of duplicates, two authors indepen­ et al., 2000). None of the studies published power calculations. Two of
dently screened titles and abstracts. Studies that were not relevant were the studies examined the use of two-person checks involving tasks in

2
R.D. McMullan et al. Applied Ergonomics 106 (2023) 103906

Fig. 1. PRISMA flow diagram of selection process.

aviation (Mosier et al., 2001; Skitka et al., 2000), three studies involved Monitoring Aid (AMA) on the task display provided participants with
tasks in chemical manufacturing (Cymek, 2018; Cymek et al., 2016; notifications about events together with recommendations for specific
Domeinski et al., 2007), and four studies in psychology included courses of action. Participants were advised that this aid was not 100%
proofreading and visual search tasks (Brown, 1963; Brown and Fox, accurate in detecting critical events. Across four trials, there were twelve
1965; Contte and Jacobs, 1997; Nihei et al., 2002). opportunities for error due to six occasions during which the AMA failed
Of studies of double-checking in aviation (Mosier et al., 2001; Skitka to notify participants of an event, and six occasions on which the AMA
et al., 2000), one examined the performance of one-person versus offered inappropriate directives. The results indicated that individuals
two-person crews in detecting automation errors (e.g., misloads of fre­ and pairs were equally likely to miss events and fail to respond to system
quency, altitude, arrival) in a flight simulator (Mosier et al., 2001). irregularities.
Forty-eight pilots from three major commercial United States carriers Three studies investigated redundancy and automation monitoring
flew under conditions that varied based on crew size (i.e., they either in the context of chemical plants (Cymek, 2018; Cymek et al., 2016;
worked alone or with another participant), type of training, and whether Domeinski et al., 2007). All three involved university students per­
they received a prompt to verify automated functioning. Six automation forming a complex task that simulated operations in the control room of
failures were included in the three stages of the simulation. The out­ a chemical plant. The first study compared participants who worked
comes indicated that there was no significant difference in performance alone (non-redundant group); with a second crewmember who worked
between single pilot and dual-pilot operations. in parallel on the task (redundant group); and those who were informed
A second study investigated automation bias by having university that the second crewmember’s performance may be low (informed
students complete computerised simulation monitoring and tracking redundant group) (Domeinski et al., 2007). The findings indicated that
tasks involved in flying commercial aircraft (Skitka et al., 2000). participants who worked in parallel with a second person (redundant
One-hundred and forty-four participants completed these tasks under group) cross-checked the automation significantly less frequently than
conditions that varied in crew size, type of training, and whether they the other groups. Subsequent studies reported similar findings, such that
received a prompt to verify automated directives. The Automated participants who worked redundantly engaged in less cross-checking

3
R.D. McMullan et al. Applied Ergonomics 106 (2023) 103906

Table 1
Overview of studies that examined two-person checks.
Author/s, Sector Study design Participants Nature of task Key outcome measures Key findings Quality
year, country rating

(Brown, Psychology Experiment: Within- 12 male employees Visual monitoring Number of positive More positive errors in the Fair
1963), subjects design at the research task errors and negative redundant condition for low
USA laboratory errors and intermediate levels of
difficulty. More positive errors
in the nonredundant condition
for high level of difficulty.
(Brown and Psychology Experiment: Within- 16 male employees Visual monitoring Number of positive Fewer negative errors in the Fair
Fox, subjects design at the research task errors and negative redundant compared to the
1965), laboratory errors nonredundant condition.
USA Fewer positive errors in the
nonredundant compared to the
redundant condition.
(Contte and Psychology Experiment: 173 undergraduate Transcript checking Error rate: omission Participants worked faster but Good
Jacobs, Between-subjects students task errors, commission made errors more frequently
1997), design errors, categorisation when in a perceived reliable,
USA errors redundant system compared to
a condition where they worked
alone or in a perceived
unreliable, redundant system.
(β = .16, p < .05).
(Cymek Chemical Experiment: 2 x 12 36 participants (35 Multi-task consisting Monitoring Participants who worked Fair
et al., manufacturing mixed factorial were students) of three subtasks performance: number redundantly with another
2016), design similar to those in a of trials where the participant performed
GER chemical plant automation was fully significantly less cross-checks
control room cross-checked than participants who worked
alone (M = 84.56 vs M =
112.28, p = .03).
(Cymek, Chemical Experiments: Exp 1: Exp 1: 47 university Multi-task consisting Number of trials Exp 1: Participants in the Good
2018), manufacturing 3 x 12 mixed students. Exp 2: 36 of three subtasks where the automation redundant condition detected
GER factorial design. Exp university students similar to those in a was fully cross- significantly less automation
2: 2 x 12 mixed chemical plant checked. Detection of failures compared to
factorial design control room automation failures participants in the
nonredundant condition
(65.6% vs. 89.9%, p = .046).
Exp 1 and Exp 2: Participants
in the redundant condition
cross-checked significantly less
status messages compared to
participants in the
nonredundant condition (Exp
1: 70.8% vs. 86.7%, p = .019;
Exp 2: 55.0% vs. 75.8%, p =
.024).
(Domeinski Chemical Experiment: mixed 36 students Multi-task consisting Number of trials Participants in the redundant Fair
et al., manufacturing factorial design of three subtasks where the automation condition cross-checked the
2007), similar to those in a was fully cross- automation significantly less
GER chemical plant checked than participants in the
control room nonredundant condition (F
(2,33) = 13.05, p < .01).
(Mosier Aviation Experiment: mixed 48 commercial Flight simulator task Number of omission No significant difference for Fair
et al., factorial design glass cockpit pilots errors related to omission error rates between
2001), automation failures one-person and two-person
USA crews (52% vs 43%; p > .05).
(Nihei et al., Psychology Experiment: 48 undergraduate Proofreading task Number of contextual Pairs of participants detected Fair
2002), JAP Between-groups students errors and surface significantly more surface
design errors detected errors than individual
participants (t(30) = 5.296, p
< .001), whereas individual
participants detected
significantly more contextual
errors than the pairs of
participants (t(15) = 2.366, p
< .05).
(Skitka et al., Aviation Experiment: 2 x 3 x 2 144 undergraduate Monitoring and Number of omission No significant differences Fair
2000), between-subjects students tracking tasks errors and number of between one-person and two-
USA design commission errors person crews for omission and
commission errors (p > .05).

than participants who worked in isolation (Cymek, 2018; Cymek et al., and Jacobs, 1997). The task involved comparing a checklist with an
2016). official transcript. Participants were randomly allocated to a condition
There were four studies conducted in psychology. One study in which they were told that they would work alone or as part of a
compared participant performance when they either worked alone or in system that involved either a computer, a faculty member, or a faculty
a redundant system to identify errors in university transcripts (Contte member and a peer. Prior to completing the task, participants in the

4
R.D. McMullan et al. Applied Ergonomics 106 (2023) 103906

redundant systems rated their confidence in the performance of the and environmental factors influence the effectiveness of
other versions of the system. These ratings were used to determine double-checking.
whether the redundant systems were perceived as reliable or unreliable. All of the studies except one recruited university students rather than
The results indicated that participants worked faster but made errors industry professionals (Mosier et al., 2001). Although Skitka et al.
more frequently when in a perceived reliable, redundant system (2000) suggest that previous studies have not found differences in stu­
compared to a condition where they worked in isolation or in a dent and pilot samples for error rates and overall response patterns
perceived unreliable, redundant system. across automation failures (Mosier and Skitka, 1996; Riley, 1996), the
Another study examined participant performance on a proofreading generalizability of the study findings are limited. Additionally, none of
task when working alone versus in pairs (Nihei et al., 2002). Participants the studies published power calculations and were most likely under­
read two passages that contained both surface and contextual errors. powered given both the design of the studies (i.e., between-groups
Participants were instructed to detect and correct misprints. Those design, mixed factorial design) and the small sample sizes.
participants who worked in pairs were each asked to read the passages
and then discuss the errors detected. When participants worked in pairs, 4.1. Types of double-checking
they detected a significantly greater number of surface errors, whereas
individuals detected significantly more contextual errors. There are different types of two-person checks. Independent double-
Two studies conducted in 1963 and 1965 examined the effect of checking is often mandated when nurses administer certain types of
redundancy on speed and reliability on a visual monitoring task (Brown, medication and specifies that each nurse separately checks medication
1963; Brown and Fox, 1965). In both studies, participants observed a details. Collaborative double-checking is another type often performed
display on which white circles were illuminated in different patterns. in aviation, where one checker calls out information which is then
Participants were provided with a card with ‘critical’ light combinations confirmed by a second checker. It could be the case that collaborative
and were instructed to release a telegraph key as fast as possible when double-checking is more effective for some safety-critical tasks, whereas
these light combinations appeared on the display. These ‘critical’ stimuli independent double-checking could be effective for others. However,
were combined with non-critical stimuli. Stimulus complexity was published studies comparing the factors involved, as well as the effec­
manipulated by varying the number of lights displayed for each stim­ tiveness of these different forms of double-checking, are rare. Impor­
ulus. Participants were recruited in pairs and completed the task in both tantly, different types of double-checking should be compared with
redundant and nonredundant conditions. In the redundant condition, single-checking since the interaction between two people may involve
participants responded to all of the stimuli as a team. However, in the factors that are not present when one person completes the same tasks. If
nonredundant condition, each participant responded independently to such factors are not controlled, they may undermine any added safety
half of the stimuli. The first of these two studies noted that participants benefits of two-person checking practices.
made more false positive errors (i.e., gave a response to a non-critical Few studies have investigated the fidelity of checking processes as
stimulus) in the redundant condition during low and intermediate performed in real-world settings. Our recent field study found that when
levels of difficulty, whereas they made more false positive errors in the nurses performed double-checks during medication administration, they
nonredundant condition during high levels of difficulty (Brown, 1963). almost always failed to independently check as required by hospital
The second study found that there were more negative errors (i.e., a policy (Westbrook et al., 2020). Instead, double-checks involved
failure to respond to a critical stimulus) in the nonredundant condition collaboration and information priming between the two nurses. This
compared to the redundant condition. However, compared to the study also demonstrated that the costs of a second nurse to double-check
nonredundant condition, there were more false positive errors in the medications at one 340-bed hospital was over $7000 per day, or $2.7M
redundant condition (Brown and Fox, 1965). annually. Nurse costs associated with double-checking processes in
Australia, which has over 740 hospitals, are in the order of $5M per day
4. Discussion nationwide.
In a report from NASA (Dismukes and Berman, 2010), flight crew
Overall, our search revealed few quantitative studies that have procedures during 60 airline operations were directly observed to
examined the effectiveness of double-checking or similar two-person examine why checklists sometimes fail to catch errors and equipment
monitoring and verification processes across several disciplines. The malfunctions. Typically, the pilot monitoring reads each checklist item,
studies from aviation, chemical manufacturing, and psychology and the pilot flying checks the item and verbally responds that it is set
demonstrated that the performance of two people checking was not correctly, and the pilot monitoring then cross-checks the item. The
superior to one person in detecting errors. As it currently stands, the findings indicated that there were 585 failures in this process. In 43
evidence-base for double-checking as an effective practice is small and instances, a pilot confirmed without visually inspecting the item, and in
of both mixed quality and findings. Accordingly, it is difficult to 42 instances items from the checklist were omitted or called out incor­
conclude whether double-checking is an effective practice for error rectly. In 113 instances, the required verification was not obtained. For
detection. example, a first officer set and called out a new altitude, but the captain
In both aviation studies, the authors suggest that individual differ­ was distracted and did not verify the new altitude on the primary flight
ences may play a role in a pilot’s interaction with automation (Mosier display. Both of these studies indicate that checking processes are not
et al., 2001; Skitka et al., 2000). For example, factors such as always completed as recommended.
self-confidence, workload and cognitive load have been found to influ­ The two direct observational studies of checking processes in real-
ence human performance with automated systems (Parasuraman and world practice are rare. The small evidencebase makes it difficult to
Riley, 1997). Such factors and how they relate to checking performance determine whether a failure of double-checking to detect errors is due to
and the presence of a second person were not examined in the studies of a failure in the fidelity of the process (i.e., whether it was performed as
double-checking in aviation nor those from chemical manufacturing and intended), due to the influence of environmental factors, or because
psychology. However, three studies of automation monitoring in the double-checking itself provides no safety benefits. Further investigation
context of chemical manufacturing did find that double-checking was is required to generate new evidence about why failures in adherence to
undermined by social loafing (Cymek, 2018; Cymek et al., 2016; checking procedures occur, and how improvements can be made to
Domeinski et al., 2007). That is, participants exerted less effort when optimise checking. The true extent of failures in checking is largely
they were aware that they were working with others who they expected unknown due to a lack of research evidence. As Dismukes and Berman
would take greater responsibility during the task. Future research should (2010) observed, only 18% of observed process failures by aircrew were
extend these findings by investigating how individual, interpersonal, corrected, suggesting that many failures go unnoticed.

5
R.D. McMullan et al. Applied Ergonomics 106 (2023) 103906

4.2. Types of checking tasks environments requires improved evidence and understanding of human
checking performance. Thus, future investigations which incorporate
The type of checking task must also be identified to evaluate the the use of technology in the checking process should be considered.
significance of differences between tasks, the cognitive resources
required to complete the tasks, and the factors that may influence the
effective completion of tasks. Tasks can be classified as mechanistic (e. 4.4. Strengths and limitations
g., comparing a prescription against the label of an IV bag), abstract (e.
g., drawing on knowledge to realise a medication needs to be diluted in a This is the first systematic review to synthesise evidence for the
different solution) (Pfeiffer et al., 2020; White et al., 2010), or moni­ effectiveness of double-checking across high-consequence industries
toring tasks (e.g., monitoring a system that autonomously controls outside of healthcare. Our review had several limitations. Our search
chemical processes) (Cymek, 2018). The distinction between mecha­ was limited to English-language studies. Grey literature was not
nistic and abstract checking tasks is an important one and may be a included. Heterogeneity in the studies limited us to a narrative summary
useful basis for identifying steps in double-checking that are at greater of the included studies.
risk of disruption. For example, declarative memory retrieval is more
negatively affected by stress than procedural memory retrieval 5. Conclusion
(Schwabe et al., 2012). Therefore, stress may threaten abstract tasks
more than mechanistic tasks. It is essential to investigate checking The nine studies from aviation, chemical manufacturing, and psy­
processes considering these factors if we are to understand the processes chology do not support the adage that two people are better than one.
conceptually and determine their potential effectiveness to reduce error. Redundancy in high-consequence industries may be advantageous for
technical systems, but it is less clear whether this is similar for human
4.3. Future directions redundancy in healthcare, aviation, and chemical engineering. It is
difficult to determine the effectiveness of double-checking based on the
The complexity of high-consequence environments, such as health­ current evidence-base that is both small and of mixed quality. Although
care settings, often makes it difficult and unethical to explore the impact safety checks are an essential part of workflows for high-consequence
of various factors on processes in real-world observational studies. tasks across many industries and sectors, there is little empirical evi­
However, simulation provides a safe and controlled environment dence to improve their design and application. Therefore, further studies
whereby researchers can manipulate and control variables to examine are required to build a stronger evidence-base from which checking
individual behaviour, generate theory, and evaluate interventions policies can be optimised for specific task contexts.
(Lamé and Dixon-Woods, 2020). Greater use of laboratory experiments
and simulation studies to systematically investigate double-checking is Funding
required to determine if and when double-checking strategies are likely
to be effective. Such evidence should then be used as the basis to design This research did not receive any specific grant from funding
checking policies which are responsive to specific task contexts and agencies in the public, commercial, or not-for-profit sectors.
consider the roles of both humans and technology.
There are important factors that cannot be reproduced in the lab,
Declaration of competing interest
such as organisational factors (e.g., safety culture). Instead, direct
observational studies and qualitative interviews may be more appro­
The authors declare that they have no known competing financial
priate methods to first explore clinical and technical practice in the local
interests or personal relationships that could have appeared to influence
context, and second to elucidate the impact of organisation level factors
the work reported in this paper.
on the effectiveness of checking processes. The ‘work-as-imagined’
versus ‘work-as-done’ distinction is an important one for policy-makers
to acknowledge, because if checking guidelines are disconnected from Acknowledgements
real-world work processes, they are unlikely to be followed and will
consequently prove ineffective at improving safety (Clay-Williams et al., Johanna Westbrook is supported by a NHMRC Elizabeth Blackburn
2015). Additionally, policy for how and when specific checking pro­ Leadership Investigator Grant (1174021).
cesses should be performed must be supported by strong empirical ev­
idence. However, we can infer from the findings of the present synthesis Appendix A. Supplementary data
that there is a disconnect between policy and empirical evidence, given
that there is currently a limited evidence-base of mixed study quality Supplementary data to this article can be found online at https://doi.
and findings for the effectiveness of double-checking. A stronger org/10.1016/j.apergo.2022.103906.
evidence-base must be established by researchers, to inform guidelines
for checking practices that are compatible with and effective in References
real-world settings.
A further future consideration is the role of technology, which is Brown, D.W., 1963. The effect of observer redundancy on display monitoring equipment.
J. Psychol. 56, 413–419. https://doi.org/10.1080/00223980.1963.9916656.
increasingly being used to automate checking processes, reducing the Brown, D.W., Fox, G.H., 1965. The effect of observer redundancy and task difficulty on
demand for some human checking. However, while technology may be display monitoring efficiency. J. Psychol. 59, 267–274. https://doi.org/10.1080/
useful for ‘matching tasks’, higher-order checking tasks may be more 00223980.1965.10544611.
Clay-Williams, R., Hounsgaard, J., Hollnagel, E., 2015. Where the rubber meets the road:
problematic. The integration of technology as part of the checking using FRAM to align work-as-imagined with work-as-done when implementing
process can lead to a false sense of security and automation bias and may clinical guidelines. Implement. Sci. 10 (1), 125. https://doi.org/10.1186/s13012-
result in an individual placing undue confidence in an automated pro­ 015-0317-y.
Contte, J.M., Jacobs, R.R., 1997. Redundant systems influences on performance. Hum.
cess and failure to adequately question or check (Mosier et al., 2001;
Perform. 10 (4), 361–380. https://doi.org/10.1207/s15327043hup1004_3.
Skitka et al., 2000). For example, Lyell et al. (2017) found staff were Cymek, D.H., 2018. Redundant automation monitoring: four eyes don’t see more than
more likely to accept false positive alerts generated by an electronic two, if everyone turns a blind eye. Hum. Factors 60 (7), 902–921. https://doi.org/
prescribing system, such that they assumed the computer ‘must be right’ 10.1177/0018720818781192.
Cymek, D.H., Jahn, S., Manzey, D.H., 2016. Monitoring and cross-checking automation:
and failed to check (Lyell et al., 2017). Understanding the types of do four eyes see more than two? Proc. Hum. Factors Ergon. Soc. Annu. Meet. 60 (1),
checking tasks that can safely be automated within complex workplace 143–147. https://doi.org/10.1177/1541931213601033.

6
R.D. McMullan et al. Applied Ergonomics 106 (2023) 103906

Dismukes, R.K., Berman, B., 2010. Checklists and Monitoring in the Cockpit: Why Crucial Mosier, K., Skitka, L., 1996. Human decision makers and automated decision aids: made
Defenses Sometimes Fail. NASA/TM-2010–216396). Retrieved from. http://huma for each other? In. Autmomate. Human Performance.: Theory Appl. 40, 201–220.
n-factors.arc.nasa.gov/publications/NASA-TM-2010-216396.pdf. Mosier, K.L., Skitka, L.J., Dunbar, M., McDonnell, L., 2001. Aircrews and automation
Domeinski, J., Wagner, R., Schöbel, M., Manzey, D., 2007. Human redundancy in bias: the advantages of teamwork? Int. J. Aviat. Psychol. 11 (1), 1–14. https://doi.
automation monitoring: effects of social loafing and social compensation. Proc. org/10.1207/S15327108IJAP1101_1.
Hum. Factors Ergon. Soc. Annu. Meet. 51 (10), 587–591. https://doi.org/10.1177/ Nihei, Y., Terashima, M., Suzuki, I., Morikawa, S., 2002. Why are four eyes better than
154193120705101004. two? Effects of collaboration on the detection of errors in proofreading. Jpn. Psychol.
Hawker, S., Payne, S., Kerr, C., Hardey, M., Powell, J., 2002. Appraising the evidence: Res. 44 (3), 173–179. https://doi.org/10.1111/1468-5884.00020.
reviewing disparate data systematically. Qual. Health Res. 12 (9), 1284–1299. Page, M.J., McKenzie, J.E., Bossuyt, P.M., Boutron, I., Hoffmann, T.C., Mulrow, C.D.,
https://doi.org/10.1177/1049732302238251. Moher, D., 2021. The PRISMA 2020 statement: an updated guideline for reporting
Hofmann, D.A., Jacobs, R., Landy, F., 1995. High reliability process industries: systematic reviews. BMJ 372, n71. https://doi.org/10.1136/bmj.n71.
individual, micro, and macro organizational influences on safety performance. J. Saf. Parasuraman, R., Riley, V., 1997. Humans and automation: use, misuse, disuse, abuse.
Res. 26 (3), 131–149. https://doi.org/10.1016/0022-4375(95)00011-E. Hum. Factors 39 (2), 230–253. https://doi.org/10.1518/001872097778543886.
Hopewell, S., Clarke, M., Lefebvre, C., Scherer, R., 2007. Handsearching versus electronic Pfeiffer, Y., Zimmermann, C., Schwappach, D.L.B., 2020. What are we doing when we
searching to identify reports of randomized trials. Cochrane Database Syst. Rev. double check? BMJ Qual. Saf. 29 (7), 536–540. https://doi.org/10.1136/bmjqs-
2007 (2), Mr000001 https://doi.org/10.1002/14651858.MR000001.pub2. 2019-009680.
ISMP, 2003. The virtues of independent double checks—they really are worth your time. Riley, V., 1996. Operator reliance on automation: theory and data. In: Automation and
Med. Safety Alert. 8 (5), 1. Retrieved from. https://pdfs.semanticscholar.org/d02a Human Performance: Theory and Applications. Lawrence Erlbaum Associates, Inc,
/90a1d7f64fb931f4af14bd1ebcf1bfa22699.pdf. Hillsdale, NJ, US, pp. 19–35.
ISMP, 2013. Independent Double Checks: Undervalued and Misused: Selective Use of Roberts, K.H., 1990. Some characteristics of one type of high reliability organization.
This Strategy Can Play an Important Role in Medication Safety. Retrieved from. https Organ. Sci. 1 (2), 160–176. Retrieved from. http://www.jstor.org/stable/2635060.
://www.ismp.org/resources/independent-double-checks-undervalued-and-misuse Schwabe, L., Joëls, M., Roozendaal, B., Wolf, O.T., Oitzl, M.S., 2012. Stress effects on
d-selective-use-strategy-can-play. memory: an update and integration. Neurosci. Biobehav. Rev. 36 (7), 1740–1749.
Keers, R.N., Williams, S.D., Cooke, J., Ashcroft, D.M., 2013. Prevalence and nature of https://doi.org/10.1016/j.neubiorev.2011.07.002.
medication administration errors in health care settings: a systematic review of Skitka, L.J., Mosier, K.L., Burdick, M., Rosenblatt, B., 2000. Automation bias and errors:
direct observational evidence. Ann. Pharmacother. 47 (2), 237–256. https://doi.org/ are crews better than individuals? Int. J. Aerospace Psychol. 10 (1), 85–97. https://
10.1345/aph.1R147. doi.org/10.1207/s15327108ijap1001_5.
Koyama, A.K., Maddox, C.-S.S., Li, L., Bucknall, T., Westbrook, J.I., 2020. Effectiveness of Toft, B., Mascie-Taylor, H., 2005. Involuntary automaticity: a work-system induced risk
double checking to reduce medication administration errors: a systematic review. to safe health care. Health Serv. Manag. Res. 18 (4), 211–216. https://doi.org/
BMJ Qual. Saf. 29 (7), 595–603. https://doi.org/10.1136/bmjqs-2019-009552. 10.1258/095148405774518615.
Lamé, G., Dixon-Woods, M., 2020. Using clinical simulation to study how to improve Westbrook, J.I., Li, L., Raban, M.Z., Woods, A., Koyama, A.K., Baysari, M.T., White, L.,
quality and safety in healthcare. BMJ Simul. Technol. Enhance Learn. 6 (2), 87–94. 2020. Associations between Double-Checking and Medication Administration Errors:
https://doi.org/10.1136/bmjstel-2018-000370. a Direct Observational Study of Paediatric Inpatients. BMJ Quality & Safety. https://
Lekka, C., 2011. High Reliability Organisations A Review of the Literature. Retrieved doi.org/10.1136/bmjqs-2020-011473 bmjqs-2020-011473.
from. https://www.hse.gov.uk/research/rrpdf/rr899.pdf. White, R.E., Trbovich, P.L., Easty, A.C., Savage, P., Trip, K., Hyland, S., 2010. Checking it
Lyell, D., Magrabi, F., Raban, M.Z., Pont, L.G., Baysari, M.T., Day, R.O., Coiera, E., 2017. twice: an evaluation of checklists for detecting medication errors at the bedside
Automation bias in electronic prescribing. BMC Med. Inf. Decis. Making 17 (1), 28. using a chemotherapy model. Qual. Health Care 19 (6), 562–567. https://doi.org/
https://doi.org/10.1186/s12911-017-0425-5. 10.1136/qshc.2009.032862.

You might also like