You are on page 1of 10

British Journal of Anaesthesia, 120 (2): 264e273 (2018)

doi: 10.1016/j.bja.2017.09.007
Advance Access Publication Date: 24 November 2017
Review Article

Competency-based assessment tools for regional


anaesthesia: a narrative review
A. Chuan1,2,*, A. S. Wan2, C. F. Royse3,4 and K. Forrest5
1
South Western Sydney Clinical School, University of New South Wales, Sydney, Australia, 2Department of
Anaesthesia, Liverpool Hospital, Sydney, Australia, 3Department of Surgery, University of Melbourne,
Melbourne, Australia, 4Department of Anaesthesia, Royal Melbourne Hospital, Melbourne, Australia and
5
Faculty of Health Sciences and Medicine, Bond University, Queensland, Australia

*Corresponding author. E-mail: dr.chuan@iinet.net.au

Abstract
Competency-based assessment tools are used in regional anaesthesia to measure the performance of study participants,
trainees, and consultants. This narrative review was performed to appraise currently published assessment tools for
regional anaesthesia. A literature search found 397 citations of which 28 peer-reviewed studies met the inclusion criteria
of primary psychometric evaluation of assessment tools for regional anaesthesia. The included studies were diverse in
the type of assessment and the skill set being assessed. The types of assessments included multiple-choice questions,
hand-motion analysis, cumulative sum, visuospatial and psychomotor screening, checklists, and global rating scales.
The skill sets that were assessed included holistic regional anaesthesia technical and non-technical performance
observed at the bedside, to isolated part-tasks, such as needle tip visualisation under ultrasound. To evaluate validity and
reliability, we compared the studies against published medical education consensus statements on ideal assessment
tools. We discuss the relative merits of different tools when used to assess regional anaesthesia, the importance of
psychometrically robust assessment tools in competency-based anaesthesia education, and directions for future edu-
cation research in regional anaesthesia.

Keywords: anaesthesia; conduction; credentialing; psychometrics; reproducibility of results

The assessment of complex anaesthesia procedures, such as setting, to part-tasks relevant to regional anaesthesia, such as
regional anaesthesia, requires evaluation of disparate knowl- anatomy knowledge and ultrasound-guided needle skills in
edge, skill sets, and professional attributes. Practically, this is simulation settings.
achieved by the use of assessment tools.1 With the interna- The design, structure, and quantitative characteristics of
tional shift towards competency-based postgraduate medical an assessment tool are defined as its psychometric properties.
education, these tools assume a central role in current Evaluating a tool’s psychometrics (‘assessing the assessment
anaesthesia curricula by assessing the trainees’ performances tool’) is necessary for acceptance by all stakeholders. These
against predefined learning objectives, standards of perfor- properties include validity, reliability, feasibility, educational
mance, and behaviour.1e6 The performances can range from stimulus, and acceptability. For clinicians, validity and reli-
the entire nerve block procedure on patients in a clinical ability are required to demonstrate robustness and confidence

Editorial decision: October 26, 2017; Accepted: December 1, 2017


© 2017 British Journal of Anaesthesia. Published by Elsevier Ltd. All rights reserved.
For Permissions, please email: permissions@elsevier.com

264
Assessment tools for regional anaesthesia - 265

in assessment results. In research studies, validated and reli-


Table 1 Psychometric properties of an ideal competency-
able tools are needed to measure performance changes after
based regional anaesthesia assessment tool. Adapted from
training interventions and for serial measurement of perfor- the 2010 Ottawa Conference for Competence in Medicine and
mance over time.7 Healthcare ‘Criteria for good assessment: consensus state-
Until relatively recently, the principles of assessment and ment and recommendations from the Ottawa 2010 Confer-
the methods of psychometric evaluation were not well ence’ by Norcini and colleagues.8 Further definitions from
disseminated beyond the medical education literature. To Bould and colleagues,1 Ahmed and colleagues,4 Van der
Vleuten,10 and Gallagher and colleagues11
improve the consistency and robustness of studies that sought
to design and evaluate assessment tools, the 2010 Ottawa
Property Description and comments
Conference for Competence in Medicine and Healthcare pub-
lished a consensus statement on ideal assessment tool prop- Validity The tool measures what it purports to
erties,8 describing a framework of reporting standards and measure.
statistical techniques used in psychometric evaluation.9 We A body of evidence, built up over
adopted this framework to appraise the published regional multiple studies, supports validity.
Must also demonstrate reliability
anaesthesia assessment tools. This narrative review dis-
cusses the type and purpose of current tools, and categorizes Face validity Suitability of the tool to the real-life
their psychometric properties, implications for clinical adop- regional anaesthesia clinical reality
tion, and priorities for research. Content Criterion-based regional anaesthesia
validity knowledge, skill set, or behaviour
being tested
Methods Construct Test scores differentiate inexperienced
validity and experienced proceduralists.
Criteria for ideal assessment tools
Discriminate Test scores differentiate novice vs expert
A utility index of psychometric properties of the ideal validity proceduralists.
competency-based assessment tool was first proposed by Van
Concurrent Test scores are consistent with an
der Vleuten10 in 1996. This was officially adopted as a validity external and objective outcome.
consensus statement at the 2010 Ottawa Conference as the Examples have included measures of
document ‘Criteria for good assessment: consensus statement block performance, such as success,
and recommendations from the Ottawa 2010 Conference’.8 time to onset, number of needle
passes, and complication rates.
The properties that defined a good assessment tool included
validity, reliability, feasibility, educational stimulus, and Predictive Test scores are predictive of clinical
acceptability of a tool by stakeholders. Table 1 summarizes validity practice.
these properties with a specific focus on regional anaesthesia. An example is in vitro simulation scores
predict in vivo block performance.
In practice, every assessment tool is a compromise be-
tween competing properties. This interdependent balance is Reliability Reproducibility of test scores if repeated
under similar circumstances
most pertinent for validity and reliability. At a minimum, this
External reliability is more important
requires evidence that the important skill sets of that task are
than internal reliability. In practice,
examined (face and content validity), whether the tool differ- trainees are scored by different
entiate trainees of different abilities (construct validity) and assessors and over multiple occasions
reliability (reproducibility of results).12,13 Of the different types throughout training; thus,
of reliability, external reliability is more important in a clinical demonstration of external reliability is
essential for validity and acceptability.
context, as it concerns consistency of scoring among different
faculty members who assess trainees. Internal Different trainees scoring consistently in
Content validity is typically assured by anchoring with reliability the same section of the tool
expert consensus guidelines. Examples include the regional External Different assessors scoring consistently
anaesthesia learning objectives in the 2015 Accreditation reliability for the same trainee performance
Council for Graduate Medical Education Milestone project.14 Also called between-assessor or inter-
rater agreement
For ultrasound-guided techniques, the joint committees of
the American Society of Regional Anesthesia and Pain Medi- Testeretest Consistency of scores over different times
cine and the European Society of Regional Anaesthesia and reliability for the same performance
Pain Therapy describe minimum competencies.15 Feasibility Minimal time, logistics, financial costs,
and assessor training required
Educational Allows structured feedback, deliberate
Framework for psychometric evaluation catalyst practice, and reflection (formative
Psychometric evaluation is an examination of these assess- assessment)
ment tool properties against how well it conforms to purpose. Acceptability Acceptability dependent on the
These often use quantitative statistical analyses. In research stakeholder and purpose of tool
studies designing new assessment tools, a primary aim is to For example, medical registration boards
evaluate reliability. However, previous reviews have noted accept tools if there is evidence of high
reliability and validity. Feasibility and
limitations in the statistical analyses of reliability used by
educational effect are not as important.
published studies.4,16 These limitations include heterogeneity Clinical tutors will accept a tool for its
and incorrect use of tests, precluding any meta-analyses of educational benefit and high feasibility.
tools. In an effort to improve the quality of medical education Other properties are not as critical.
research, an additional 2010 Ottawa Conference statement on
266 - Chuan et al.

reliability methods was published in the document ‘Research average measures); and does the numerical value of scores
in Assessment’.9 Tests of reliability were classified as de- carry discrete information (absolute agreement)?
rivatives of the classical test theory, generalisability theory, The most stringent model is ICC (2A, 1), used when a single
and item-response theory. randomly chosen assessor scores a randomly chosen trainee
The majority of tests familiar to anaesthetists are based on performing a randomly selected regional anaesthesia task,
the classical test theory. It is assumed that the observed score with absolute agreement in scores. Cronbach’s a is simply the
X is the sum of the true score T and any errors in measurement consistency model ICC (C, k).21 As a rule of thumb, ICC should
E, described by the formula X ¼ T þ E. Reliability coefficients be 0.80 or higher for high-stakes exams, and only marginally
estimate this amount of error; higher coefficients suggest low less for formative assessments.10,13,19
errors and greater reproducibility in observed test scores.13 The generalisability theory is the second major statistical
Examples of classical tests include split-half methodology, technique to evaluate reliability. This theory is also a form of
Cronbach’s a, Cohen’s k, and the intra-class correlation coef- ANOVA and calculates the ratio of true variance (true score) to
ficient (ICC). total variance (true score and errors). Analogous to ICC
Split-half methodology is based on the correlation between analysis, the assumptions used when performing general-
scores from one-half of questions in a test against scores from isability modelling have significant effects on the resultant
the other half. This is most appropriately used in large banks coefficient.9,23 Generalisability is uncommon in the anaes-
of multiple-choice questions (MCQs). Split-half methods can thesia literature. It has been used previously for paediatric
identify questions with low correlations, prompting a rewrite anaesthesia skills24 and validation of the mini-clinical eval-
or removal. uation examination tool in anaesthesia practice.25 One
Cronbach a is a parallel test design applicable for contin- advantage over classical tests includes identifying the source
uous data, and is, in turn, the general form of the of scoring inconsistency from differences in patient difficulty
KudereRichardson 20 test used for dichotomous data.17 (case specificity), or from differences in assessors (inter-rater
Despite being a common test, Cronbach a has been incor- error variance).13,26 For example, generalisability can provide
rectly used and its limitations underappreciated. Firstly, indication if an otherwise satisfactory trainee received a low
Cronbach a is artificially increased when used for criteria-based score because of a technically challenging patient, or if
assessments and by having more items in the test.9 Also, scoring was biased by a ‘hawkish’ assessor. Another advan-
Cronbach a is sensitive to individual inter-item correlations; a tage is reliability modelling, akin to a sample-size power
subset of items may weigh disproportionally on the overall calculation. Based on the raw data set, a decision study es-
average Cronbach a value for the entire tool. Equally, the test is timates the effect on reliability when changing the number of
susceptible to missing data with resultant low Cronbach a assessments performed vs the number of assessors
values. There are recommendations to calculate standard er- required.9
rors of mean with 95% confidence intervals to provide infor- The third option to create assessment tools with inherent
mation on the absolute agreement of scores.9,11 reliability is to use item-response theory. This method has
The analysis of external reliability can take several forms. been used recently to create a 47-item multiple-choice
In the simplest, the testeretest method measures scoring assessment tool for sono-anatomy relevant for ultrasound-
consistency at initial testing vs retesting at a later time by the guided regional anaesthesia (UGRA).27 Unlike the previous
same assessor. As a measure of score stability, there is a bal- statistical methods, item response is based on the analysis of
ance between retested too early (recall and learning bias) and individual items in the tool rather than scores by assessors.
too late (assessors may have gained experience in scoring). For Each item is assessed to its probability of a correct answer vs
reliability between different assessors, Cohen’s k, ICC, and trainee ability in an iterative process of item calibration and
Pearson correlation tests have been used. proficiency estimation. The discriminatory power of each item
Cohen’s k is typically used when two assessors are is determined by the location index, which is a graphical
involved. It is a more robust test than simple percentage representation of the ‘item characteristic curve’. The addition
comparisons, as Cohen’s k takes into account the possibility of of Rasch one-parameter mathematical modelling allows the
chance agreement. For three or more assessors, the ICC sta- identification of poorly fitting items, which may then be
tistic can be used. As ICC is a form of analysis of variance examined for causes of poor discrimination, and amended
(ANOVA), the test can correct for chance variance, and is able to appropriately or deleted. More complex two- and three-
measure consistency and absolute agreement between parameter logistic models allow better discrimination be-
scores.18 This is particularly important when compared with tween trainees of variable ability, but require very large sam-
Pearson correlation, which can only measure consistency. In pling sizes.9
practical terms, Pearson correlation sets the upper limit of ICC
values in ideal situations, but the calculated ICC is much
Literature search
lower. Studies that only use Pearson correlation, therefore,
risk overstating the reliability of the assessment tool.19 We undertook an electronic literature search for articles on
ICC itself has six statistical forms, and the choice of model regional anaesthesia assessment tools in PubMed, Embase, and
will markedly change the result.20,21 Shrout and Fleiss22 Medline. Articles were deemed eligible for inclusion if the pri-
demonstrated that the six ICC models produce results mary aim was psychometric evaluation of competency-based
ranging from 0.17 to 0.91 using the same raw data. As an assessment tools for regional anaesthesia knowledge, skills,
example of using ICC to analyse a regional anaesthesia or behaviours. The following search terms and medical subject
assessment tool, the following factors are considered: is the headings were used: regional anaesthesia, conduction anaes-
assessed task a random selection of all block procedures, and thesia, plexus block, brachial plexus block, femoral nerve block,
are the assessors randomly chosen from all possible assessors sciatic-nerve block, neuraxial block, spinal anaesthesia,
(one vs two way, fixed vs random effects); is the test designed epidural anaesthesia, paravertebral block, human, clinical
to be scored by a single assessor, or two or more (single vs competence, and not limited for year. The search methodology
Assessment tools for regional anaesthesia - 267

was registered with PROSPERO (CRD42017068029) and per- against the inclusion criteria, to reach a final list. Each article
formed in May 2017. in this final list was then compared against the criteria for
All identified citations were independently reviewed by ideal assessment tools. The framework for psychometric
authors A.C. and A.S.W. Potentially relevant records, based on evaluation was then applied to each article’s statistical
title and abstract, were selected. Earmarked citations, chosen methods and results.
by either of the blinded authors, were combined into a provi-
sional list. We also manually searched references within ar-
ticles for any relevant citations not already included. Only full- Results
text articles in the English language and published as peer- There were a total of 397 records identified, comprising 391
reviewed journal articles were accepted. These authors then citations from the literature search and six citations from bib-
jointly scrutinized the methodology of each full-text article liographies, as presented in the PRISMA flow diagram (Fig 1).
Idenficaon

Records idenfied through Addional records idenfied


database searching through other sources
(n = 391) (n = 6)

Records aer duplicates removed


(n = 350)
Screening

Records screened by tle Records excluded


and abstract (n = 265)
(n = 350)

Full-text arcles excluded:


(n = 50)
- Did not perform psychometric
Eligibility

analysis of assessment tool


Full-text arcles assessed - Lumbar puncture only
for eligibility - Referred to earlier arcle where
(n = 85) primary validaon was performed

Studies of interest:
(n = 7)
- psychometric analysis of
assessment tool was secondary aim
Included

Studies included in
qualitave synthesis
(n = 28)

Fig 1. PRISMA flow diagram of the study selection process. Six additional records identified through other sources were Smith and col-
leagues (2012),28 Duce and colleagues (2016),29 de Oliveira Filho and colleagues (2008),30 de Oliveira Filho (2002),31 Schuepfer and colleagues
(2000),32 and Friedman and colleagues (2008).33
268 - Chuan et al.

Twenty-eight articles met the inclusion criteria. Multiple types individuals with higher levels of visuospatial and psychomo-
of assessment tools were used, including MCQs, hand-motion tor ability demonstrate greater proficiency in laparoscopic
analysis, cumulative sum (CUSUM), visuospatial and psycho- surgery, general surgery, interventional radiology, anatomy,
motor screening, checklists, and global rating scales (GRSs). endoscopy, and fibre-optic intubation tasks.61e69 Potentially,
To allow for an easier comparison of the quality of tools screening for these abilities may identify trainees of lower
within each type of regional anaesthesia procedure, the arti- ability and help target educational interventions to assist
cles were categorized by the purpose of assessment: use in them attain competency.
part-trainer tasks for regional anaesthesia (Supplementary Four studies have been performed in regional anaesthesia.
Table 1); neuraxial-only regional anaesthesia clinical proced- The microcomputerised personnel aptitude tester was used to
ures (Supplementary Table 2); and peripheral regional screen trainees for psychomotor ability and performance of
anaesthesia clinical procedures, including one tool assessing labour epidurals (Dashfield and colleagues40). Proficiency was
peripheral and neuraxial blocks (Supplementary Table 3). An only weakly associated using Pearson correlation, and results
in-depth explanation of the advantages and disadvantages of may also be influenced by not using a formally recognized
specific types of assessment is provided next. neurocognitive test.
In seven other articles, an assessment tool was purpose- The other three studies used tasks related to UGRA. Smith
fully designed for that study, but psychometric evaluation was and colleagues28 recruited trainees and screened visuospatial
not the primary aim. These seven articles are in and psychomotor abilities using a battery of neurocognitive
Supplementary Table 4 as studies of interest, as their meth- tests. All trainees performed a needle visualisation task on a
odologies did include some information regarding the tool’s bench-top model, using time and image-quality end points.
psychometric properties. Only the block design test (a subset of the Wechsler Adult In-
telligence Scale) was associated with performance, whilst
psychomotor tests were not significant.
Multiple-choice questions
Visuospatial ability was again significant in a needle visu-
MCQ tests have the advantages of testing multiple knowledge alisation task (Shafqat and colleagues36) and brachial plexus
areas and efficiently testing large groups of trainees. The in- sonography (Duce and colleagues29) studies, both finding that
clusion of clinical information into MCQs makes them context the Mental Rotations Test-A was significant in discriminating
rich and forces trainees to use decision-making skills to performance. In the former study, the reliability of the com-
consider clinical issues alongside theoretical considerations.2 posite error score and GRS was also evaluated with ICC and
Uniquely in regional anaesthesia, Woodworth and col- Cronbach a, with all results 0.90.
leagues27 reduced an initial 81-item MCQ to 50 items through
expert consultation. This was validated by the item-response
Checklists
theory and Rasch modelling to a robust final 47-item MCQ.
This final form has evidence for face, content, and construct Checklists are designed to assess the component skills of a
validity and reliability in testing sono-anatomy knowledge for procedure. Checklists can be specific for individual nerve
femoral, sciatic, and above-clavicle brachial plexus blocks. blocks, or be intended for a range of blocks. Identification of
the core skills can be determined informally through
consensus opinions or formally through the Delphi process.
Hand-motion analysis
During pilot and validation of new checklists, a psychometric
Motion analysis uses the Imperial College Surgical Assessment analysis of individual items allows the identification of items
Device (Polhemus, Vermont, USA, and Imperial College London, with poor reliability.
UK),57 a radiofrequency tracking system to monitor dexterity, via Checklists provide a rich source for formative feedback, as
sensor-embedded gloves worn by the proceduralist. Three end weaker component skills within a whole performance can be
points are measured: time taken to perform a task, number of identified. Contrariwise, checklists reward thoroughness in
movements, and path length, as a surrogate of economy of trainees, but not necessarily discriminate between experts, as
motion. The attraction of hand motion lies with objective end the latter group tends to use shortcuts in technique.1
points, measured by a computerized tracking system. Two One study in regional anaesthesia used a 61-item lumbar
studies in regional anaesthesia demonstrated construct validity epidural checklist using consensus opinions from experts
between novices and experts performing ultrasound-guided (Sivarajan and colleagues38). They defined eight ‘critical
supraclavicular brachial plexus blocks (Chin and colleagues49) items’, such as accidental dural puncture that was deemed an
and in labour epidurals (Hayter and colleagues43). In both automatic unsatisfactory performance for the entire checklist.
studies, a secondary aim was concurrent validation with previ- For initial piloting, assessors rated eight videotaped lumbar
ously used combined checklist and GRSs.42,48 However, these epidural insertions. Moderate to high reliability was confirmed
were assessed using Pearson or Spearman correlations. with Cohen’s k, but neither construct validity nor feasibility
was evaluated.

Visuospatial and psychomotor ability screening


GRSs and derivatives
The CattelleHorneCarroll theory is a taxonomic system
dividing human intelligence into cognitive strata that include GRSs are differentiated from checklists by the use of Likert
visuospatial and psychomotor ability.58,59 Visuospatial ability scales for scoring and inclusion of items assessing non-
itself is defined as the cognitive capacity to generate, retain, technical aspects, such as communication, teamwork, and
retrieve, manipulate, and process visual information.58 Psy- efficiency of motion. The theoretical disadvantages of GRSs
chomotor ability influences the capacity to precisely control include the halo effect (the general impression of the trainee
objects via bimanual dexterity, and incorporates handeeye influencing scoring of individual items) and scale limitation
coordination, reaction speed, and accuracy.60 In medicine, (assessors marking in a small range of Likert scores instead of
Assessment tools for regional anaesthesia - 269

using the entire scale).1 This has led to a belief that dichoto- and in a clinical validation phase using ICC (2A, k) with an
mous checklists are objective, whilst the subjectivity inherent overall reliability of 0.80.
in GRSs is seen as a shortcoming. However, in multiple clinical
studies, and repeated in a recent meta-analysis, both GRSs and
Cumulative sum analysis
checklists appear to be at least equally reliable.70e73
The original GRS was a seven-item tool scored on a 5-point CUSUM analysis is a type of sequential analysis first used to
Likert scale, introduced in 1997 for objective structured measure quality assurance in industrial manufacturing. Used
assessment of technical skills in surgery trainees.37 The direct in medicine, a trainee’s performance can be plotted over
observation of procedural skills (DOPS) tool was later intro- successive attempts as trending towards achieving, or moving
duced in 2002, also based on a Likert scoring scale.6 A type of away from, an agreed competency criterion.75 CUSUM is
DOPS used in the Australian and New Zealand College of different from the psychometric framework and is not
Anaesthetists for regional anaesthesia workplace-based amenable to the statistical analyses described in this review.
assessment was evaluated for its psychometric properties for Instead, we summarized the major features of CUSUM and the
a mixture of ultrasound-guided peripheral plexus blocks insights this tool brings to the assessment of regional
(Watson and colleagues52). Whilst exhibiting construct validity anaesthesia performance.
and feasibility, the external reliability of the DOPS using ICC In the medical literature, the sequential probability-ratio
(2A, 1) models was 0.32. The critical impact of an appropriate CUSUM is the most common form. This requires an a priori
ICC model choice was highlighted; if two assessors are nor- decision of the acceptable a error (risk of mislabelling a trainee
mally used to score trainees, then ICC (2A, k) modelling will as incompetent) and b error (risk of mislabelling an incompe-
instead provide a markedly improved reliability coefficient of tent trainee as competent), and acceptable (p0) and unaccept-
0.88 (Chuan and colleagues56). able (p1) failure rates. This generates statistical thresholds for
either attaining or not attaining competency over time,
respectively, denoted as the h0 lower and h1 upper decision
Combined checklists and GRSs thresholds. The determination of the success or failure for
Combining a checklist with a GRS has been suggested as the each attempt is a binary outcome and should use an objective
most appropriate tool to assess medical procedures.1 A com- end point. Performance that is trending towards competency
bined tool can assess clinical technical skills and professional is graphically represented as a decreasing CUSUM plot
attributes. Chronologically, the first combined tools were crossing the h0 lower threshold, whilst worsening perfor-
developed for single regional anaesthesia procedures: mance causes the plot to increase towards the h1 upper
landmark-guided obstetric epidural analgesia (Friedman and threshold.
colleagues42) and spinal anaesthesia (Breen and colleagues45), Disadvantages for using CUSUM in medical education have
landmark-based neurostimulation-guided interscalene been raised, including logistical complexity, as multiple at-
brachial plexus blocks (Naik and colleagues48), and ultrasound- tempts per trainee are required to construct the CUSUM plot;
guided axillary brachial plexus blocks (Sultan and colleagues50). differences in technical difficulty cannot be modelled; and
The last study is notable in the use of cognitive task analysis, a previous performance is disproportionally weighed, which
format of semi-structured interviews and cross referencing may not be reflective of current levels of competency.75 More
against observations by experts, to break down complex pro- recently, Sivaprakasam and Purva44 demonstrated that
cedures into smaller-component tasks.74 This helps avoid un- acceptable and unacceptable labour epidural failure rates are
intentional omission of critical tasks, which can be as high as sensitive to small changes in a priori variables.
70%, when using informal expert panels. CUSUM articles have described the learning curve for a
More recent studies have developed tools to encompass a range of regional anaesthesia tasks. These have included part-
range of regional anaesthesia procedures. Cheung and col- tasks (ultrasound-guided needle-tip visualisation30,35 and
leagues51 used a Delphi technique to construct a combined 22- neuraxial sono-anatomy34) and clinical procedures (lumbar
item checklist and a nine-item GRS suitable for peripheral neuraxial,31,39,41,44,46 caudal blocks,32 and penile blocks47). The
UGRA blocks, but did not undertake any psychometric anal- studies show a wide inter-personal variability in learning adult
ysis. This analysis was subsequently performed in 2014 in two blocks, but less for paediatric blocks.
separate studies with different results for reliability. Whilst Learning curves generated from CUSUM studies have had
both studies found evidence for construct validity, Wong and practical implications. Novices benefit greatly from deliberate
colleagues53 found that the inter-rater reliability was overall practice intervention35 vs a discovery-learning protocol30
poor (ICC 0.44, single-measure ICC model 2A, 1) despite when learning a needle-guidance task with ultrasound. Mar-
piloting, modifications to the descriptors, and assessor garido and colleagues34 used CUSUM to trial the learning curve
training. In contrast, Burckett-St. Laurent and colleagues54 of a deliberate practice intervention for novices to learn neu-
found moderate reliability for both checklist and GRS in the raxial sono-anatomy, resulting in changes to their curriculum.
clinical setting, but less in the simulation environment (ICC Drake and colleagues46 used CUSUM to analyse retrospectively
0.6e0.8, average-measure ICC model not specified). The dif- their labour epidural complications, which led them to modify
ference in reliabilities reported by these two studies is large their informed-consent material.
and likely because of the use of different ICC models.
Finally, our group designed a tool for use in any regional
anaesthesia procedure (Chuan and colleagues55). Validation
Discussion
and reliability testing were conducted using videos of neu- The aims and nature of the assessment in anaesthesia have
raxial and peripheral nerve blocks, performed by landmark-, fundamentally changed. Internationally, anaesthesia cre-
neurostimulation-, and ultrasound-guided needle guidance. dentialing bodies have adopted competency-based curricula
The external reliability was determined by two independent based on the principles of the adult lifelong learner.76 In this
groups of experts in a testeretest pilot phase (Cohen’s k 0.70) context, assessments have assumed importance as both
270 - Chuan et al.

summative (assessment of learning and high-stakes tests) and remain unknown, as motion analysis describes what experts
formative (assessment for learning, feedback, and reflection) already can do, but not how to progress novices to competent
tools, anchored on predefined criteria and purposefully performance. CUSUM analysis has shown high variability in
designed.8,77,78 The 2010 Ottawa documents provide a frame- learning regional anaesthesia. More recent studies have
work to perform psychometric evaluation of these assessment explored how to integrate CUSUM techniques into prospective
tools. quality-assurance projects for both novices and experienced
The significance of assessing the assessment tools should proceduralists,44 and have a potential to provide formative
not be underestimated. For summative assessments, evidence feedback.
of fairness and test score reproducibility is a minimum Combined checklists and GRSs have been used for neu-
requirement for stakeholders. More recent tools have re-used raxial and peripheral nerve block assessment. An attractive
or modified older checklists and GRSs, and thus, the original aspect of the combined tools is the concurrent assessment of
tools should be critically evaluated for validity and reliability. technical and non-technical skills and professional attributes.
Lastly, research in regional anaesthesia education would be Regional anaesthesia experts are characterized not just by
greatly advanced if we used accepted, common, and reliable higher technical proficiency, but also more developed cogni-
tools. This would allow pooling of results in meta-analyses, tive (decision making, situational awareness, resource man-
comparisons between institutions, comparisons over time, agement, and anticipation of problems) and interpersonal
and comparing the relative effectiveness of different educa- skills (communication, teamwork, leadership, and self-
tional interventions. awareness of limitations).82,83
The literature search found that several types of assess- The GRSs used in these combined tools are similar in
ment tools have been used to evaluate regional anaesthesia design. Encouragingly, across different block types and study
performance for different purposes, from part-tasks to environments, validity and reliability appear to be consistent.
workplace-based clinical assessment. There was recognition In contrast, appraisal of the checklist component shows that
in all articles to conform to the criteria of ideal tools, namely, further research is required. The most recent checklists were
validity (in particular face, content, and construct validity) and designed to be applicable across multiple block types (Cheung
reliability. A smaller number of articles evaluated feasibility, and colleagues,51 for ultrasound-guided peripheral blocks;
but no article attempted to evaluate educational benefit or Chuan and colleagues,55 for all peripheral and neuraxial
stakeholder acceptability. blocks). In the former tool, the two validation studies showed
The included studies were heterogeneous in methodology, marked difference in reliability, whilst in the latter tool a
especially in statistical analysis, which makes interpretation confirmatory validation study is yet to be published. The
difficult. A recurring problem identified in this review con- attractiveness of either tool is broader applicability than tools
cerned reliability testing. In our experience, the modelling designed for a particular block type. If consistent validity and
assumptions that greatly inflate the ICC coefficient are average reliability can be demonstrated, these tools could offer a
measures and consistency agreement. Unless the assessment standardized single-assessment format across multiple
tool was purposefully designed to be scored by multiple as- institutions.
sessors simultaneously, single-measure models should be There are several avenues for future research using
used. With Likert scales, absolute-agreement models should assessment tools. Individuals learn differently, which con-
be used. Researchers should state the ICC model used in their tributes to the variability of the UGRA learning curve. Assess-
methodology, as incorrect modelling will provide a false ments could be repurposed as primary screening tools to
assurance of reliability. Similarly, the limitations of Pearson identify novices who are predicted to struggle with UGRA
correlation and Cronbach’s a must be recognized as only skills. As an example, our current studies suggest that visuo-
describing reliability in ideal situations, and the use of an spatial, rather than psychomotor, ability has a better predic-
appropriate ICC model is instead encouraged. tive value for future UGRA task performance.28,29,36,40
Learning a complex procedure, such as UGRA, by decon- Educational interventions may then be more productively
struction into component part-tasks comes from the surgical directed to these trainees, which is essential in our resource-
literature.79 This systematic approach of pre-training im- and time-poor teaching environment.
proves learning by concentrating on known quality compro- Finally, non-technical attributes in regional anaesthesia lag
mising behaviours seen in UGRA novices.80,81 Following this, behind that of our understanding of motor skills develop-
current part-task assessment tools have focused on, first, ment.84 Technical performance of nerve blocks is but one
competency in theoretical sono-anatomy; second, defining aspect of delivery of anaesthesia care to a patient. As suc-
learning in ultrasound image acquisition; and, third, learning cinctly described by Neal,85 ‘focusing on technical success says
of real-time needle guidance in low- and high-fidelity phan- nothing regarding the resident’s overall competence as a
toms. Further MCQs, constructed using the item-response physician’. Further simulation and best methods to train these
theory as exemplified by Woodworth and colleagues,27 important professional attributes are required.
should be created to address important gaps, such as neu- The limitations of this review include potential bias from
raxial sono-anatomy. A relative deficiency is lack of a vali- the process of article selection and from the quality of the
dated checklist or GRS tool to assess competency in practical included articles. The initial screening of citations was per-
UGRA skills using part-trainer models. formed independently by blinded authors and the selected
In contrast to part-tasks, workplace-based assessment articles were then systematically evaluated. Whilst the pro-
tools aim to evaluate the trainees over their entire regional cess was methodical, relevant articles may have been inad-
anaesthesia clinical performance. Nearly all types of tools vertently omitted. More seriously, bias in medical education
have been used in this context. Motion-analysis tools have articles is not as readily estimated as in clinical trials, because
demonstrated the ability to discriminate between novices and of quite different study designs. Subsequently, a risk of bias
experts in both neuraxial and peripheral regional anaes- and quality of evidence evaluation was not performed; we
thesia.43,49 However, practical educational interventions instead chose to evaluate the statistical rigour and
Assessment tools for regional anaesthesia - 271

quantitative results of the included studies against the psy- 10. Van der Vleuten C. The assessment of professional
chometric framework. competence: developments, research and practical im-
In conclusion, competency-based assessment tools are plications. Adv Health Sci Educ Theory Pract 1996; 1: 41e67
used in all aspects of assessing regional anaesthesia perfor- 11. Gallagher AG, Ritter EM, Satava RM. Fundamental princi-
mance, and provide information on the competency, safety, ples of validation, and reliability: rigorous science for the
and educational progress of practitioners. The psychometric assessment of surgical education and training. Surg Endosc
evaluation of these tools provides credibility of results ob- 2003; 17: 1525e9
tained by the tool. Following published frameworks on 12. Downing S. Validity: on meaningful interpretation of
methodology and statistical analysis will improve the report- assessment data. Med Educ 2003; 37: 830e7
ing and interpretation of studies of regional anaesthesia 13. Downing S. Reliability: on the reproducibility of assess-
assessment tools. ment data. Med Educ 2004; 38: 1006e12
14. Accreditation Council for Graduate Medical Education and
the American Board of Anesthesiology. The anesthesiology
Authors’ contributions milestone project. 2015. https://www.acgme.org/Portals/0/
PDFs/Milestones/AnesthesiologyMilestones.pdf.
Study design derived from a section of a doctoral thesis: A.C.
[Accessed 6 May 2016]
Supervision of doctoral thesis: C.F.R., K.F.
15. Sites B, Chan V, Neal J, et al. The American Society of
Electronic search, study selection, and analysis: A.C., A.S.W.
Regional Anesthesia and Pain Medicine and the European
Critical review of manuscript drafts and approval of the final
Society of Regional Anaesthesia and Pain Therapy Joint
version: all authors.
Committee recommendations for education and training
in ultrasound-guided regional anesthesia. Reg Anesth Pain
Declaration of interest Med 2010; 35: S74e80
16. Kogan J, Holmboe E, Hauer K. Tools for direct observation
None declared. and assessment of clinical skills of medical trainees: a
systematic review. JAMA 2009; 302: 1316e26
17. Bravo G, Potvin L. Estimating the reliability of continuous
Funding measures with Cronbach’s alpha or the intraclass corre-
Australian National Health and Medical Research Council lation coefficient: toward the integration of two traditions.
Postgraduate Fellowship (APP1056280) to A.C.; Australian J Clin Epidemiol 1991; 44: 381e90
Society of Anaesthetists PhD Support Grant (2013) to A.C. 18. Chuan A, Graham P, Forrest K, et al. Regional anaesthesia
assessment toolsda reply. Anaesthesia 2016; 71: 473e4
19. Cicchetti D. Guidelines, criteria, and rules of thumb for
Appendix A. Supplementary data evaluating normed and standardized assessment in-
struments in psychology. Psychol Assess 1994; 6: 284e90
Supplementary data related to this article can be found at
20. Bartko J. On various intraclass correlation reliability co-
https://doi.org/10.1016/j.bja.2017.09.007.
efficients. Psychol Bull 1976; 83: 762e5
21. McGraw K, Wong S. Forming inferences about some
intraclass correlation coefficients. Psychol Methods 1996; 1:
References
30e46
1. Bould M, Crabtree N, Naik V. Assessment of procedural 22. Shrout PE, Fleiss JL. Intraclass correlations: uses in
skills in anaesthesia. Br J Anaesth 2009; 103: 472e83 assessing rater reliability. Psychol Bull 1979; 86: 420e8
2. Epstein R. Assessment in medical education. N Engl J Med 23. Briesch A, Swaminathan H, Welsh M, Chafouleas S. Gener-
2007; 356: 387e96 alizability theory: a practical guide to study design, imple-
3. Glavin R, Maran N. Editorial I: development and use of mentation, and interpretation. J Sch Psychol 2014; 52: 13e35
scoring systems for assessment of clinical competence. Br 24. Fehr J, Boulet J, Waldrop W, Snider R, Brockel M, Murray D.
J Anaesth 2002; 88: 329e30 Simulation-based assessment of pediatric anesthesia
4. Ahmed K, Miskovic D, Darzi A, Athanasiou T, Hanna GB. skills. Anesthesiology 2011; 115: 1308e15
Observational tools for assessment of procedural skills: a 25. Weller J, Jolly B, Misur M, et al. Mini-clinical evaluation
systematic review. Am J Surg 2011; 202: 469e80. e6 exercise in anaesthesia training. Br J Anaesth 2009; 102:
5. Kathirgamanathan A, Woods L. Educational tools in the 633e41
assessment of trainees in anaesthesia. Contin Educ Anaesth 26. Crossley J, Davies H, Humphris G, Jolly B. Generalisability:
Crit Care Pain 2011; 11: 138e42 a key to unlock professional assessment. Med Educ 2002;
6. Wragg A, Wade W, Fuller G, Cowan G, Mills P. Assessing 36: 972e8. erratum, Med Educ 2003; 37: 574
the performance of specialist registrars. Clin Med 2003; 3: 27. Woodworth G, Carney P, Cohen J, et al. Development and
131e4 validation of an assessment of regional anesthesia ultra-
7. Allen B, Sandberg W. Assessment tools: searching for sound interpretation skills. Reg Anesth Pain Med 2015; 40:
purpose. Reg Anesth Pain Med 2015; 40: 299e300 306e14
8. Norcini J, Anderson B, Bollela V, et al. Criteria for good 28. Smith H, Kopp S, Johnson R, Long T, Cerhan J, Hebl J.
assessment: consensus statement and recommendations Looking into learning: visuospatial and psychomotor
from the Ottawa 2010 conference. Med Teach 2011; 33: predictors of ultrasound-guided procedural performance.
206e14 Reg Anesth Pain Med 2012; 37: 441e7
9. Schuwirth L, Colliver J, Gruppen L, et al. Research in assess- 29. Duce N, Gillett L, Descallar J, Tran M, Siu S, Chuan A. Vi-
ment: consensus statement and recommendations from the suospatial ability and novice brachial plexus sonography
Ottawa 2010 conference. Med Teach 2011; 33: 224e33 performance. Acta Anaesthesiol Scand 2016; 60: 1161e9
272 - Chuan et al.

30. de Oliveira Filho G, Helayel P, da Conceicao D, Garzel I, 48. Naik V, Perlas A, Chandra D, Chung D, Chan V. An
Pavei P, Ceccon M. Learning curves and mathematical assessment tool for brachial plexus regional anesthesia
models for interventional ultrasound basic skills. Anesth performance: establishing construct validity and reli-
Analg 2008; 106: 568e73 ability. Reg Anesth Pain Med 2007; 32: 41e5
31. de Oliveira Filho G. The construction of learning curves 49. Chin K, Tse C, Chan V, Tan J, Lupu C, Hayter M. Hand-
for basic skills in anesthetic procedures: an application motion analysis using the imperial college surgical
for the cumulative sum method. Anesth Analg 2002; 95: assessment device: validation of a novel and objective
411e6 performance measure in ultrasound-guided peripheral
32. Schuepfer G, Konrad C, Schmeck J, Poortmans G, nerve blockade. Reg Anesth Pain Med 2011; 36: 213e9
Staffelbach B, Johr M. Generating a learning curve for 50. Sultan S, Iohom G, Saunders J, Shorten G. A clinical
pediatric caudal epidural blocks: an empirical evaluation assessment tool for ultrasound-guided axillary brachial
of technical skills in novice and experienced anesthetists. plexus block. Acta Anaesthesiol Scand 2012; 56: 616e23
Reg Anesth Pain Med 2000; 25: 385e8 51. Cheung J, Chen E, Darani R, McCartney C, Dubrowski A,
33. Friedman Z, Siddiqui N, Katznelson R, Devito I, Davies S. Awad I. The creation of an objective assessment tool for
Experience is not enough: repeated breaches in epidural ultrasound-guided regional anesthesia using the Delphi
anesthesia aseptic technique by novice operators despite method. Reg Anesth Pain Med 2012; 37: 329e33
improved skill. Anesthesiology 2008; 108: 914e20 52. Watson M, Wong D, Kluger R, et al. Psychometric evalu-
34. Margarido CB, Arzola C, Balki M, Carvalho JC. Anesthesi- ation of a direct observation of procedural skills assess-
ologists’ learning curves for ultrasound assessment of the ment tool for ultrasound-guided regional anaesthesia.
lumbar spine. Can J Anaesth 2010; 57: 120e6 Anaesthesia 2014; 69: 604e12
35. Barrington M, Wong D, Slater B, Ivanusic J, Ovens M. Ul- 53. Wong D, Watson M, Kluger R, et al. Evaluation of a task-
trasound-guided regional anesthesia: how much practice specific checklist and global rating scale for ultrasound-
do novices require before achieving competency in ul- guided regional anesthesia. Reg Anesth Pain Med 2014; 39:
trasound needle visualization using a cadaver model. Reg 399e408
Anesth Pain Med 2012; 37: 334e9 54. Burckett-St Laurent D, Niazi A, Cunningham M, et al.
36. Shafqat A, Ferguson E, Thanawala V, Bedforth N, A valid and reliable assessment tool for remote
Hardman J, McCahon R. Visuospatial ability as a predictor simulation-based ultrasound-guided regional anesthesia.
of novice performance in ultrasound-guided regional Reg Anesth Pain Med 2014; 39: 496e501
anesthesia. Anesthesiology 2015; 123: 1188e97 55. Chuan A, Graham P, Wong D, et al. Design and validation
37. Martin J, Regehr G, Reznick R, et al. Objective structured of the regional anaesthesia procedural skills assessment
assessment of technical skill (OSATS) for surgical resi- tool. Anaesthesia 2015; 70: 1401e11
dents. Br J Surg 1997; 84: 273e8 56. Chuan A, Thillainathan S, Graham P, et al. Reliability of
38. Sivarajan M, Lane P, Miller E, et al. Performance evalua- the direct observation of procedural skills assessment tool
tion: continuous lumbar epidural anesthesia skill test. for ultrasound-guided regional anaesthesia. Anaesth
Anesth Analg 1981; 60: 543e7 Intensive Care 2016; 44: 201e8
39. Kestin I. A statistical approach to measuring the compe- 57. van Hove P, Tuijthof G, Verdaasdonk E, Stassen L,
tence of anaesthetic trainees at practical procedures. Br J Dankelman J. Objective assessment of technical surgical
Anaesth 1995; 75: 805e9 skills. Br J Surg 2010; 97: 972e87
40. Dashfield A, Coghill J, Langton J. Correlating obstetric 58. McGrew K. CHC theory and the human cognitive abilities
epidural anaesthesia performance and psychomotor project: standing on the shoulders of the giants of psy-
aptitude. Anaesthesia 2000; 55: 744e9 chometric intelligence research. Intelligence 2009; 37: 1e10
41. Naik V, Devito I, Halpern S. Cusum analysis is a useful tool 59. Schneider W, McGrew K. The CattelleHorneCarroll model
to assess resident proficiency at insertion of labour epi- of intelligence. In: Flanagan P, Harrison P, editors.
durals. Can J Anaesth 2003; 50: 694e8 Contemporary intellectual assessment: theories, tests, and is-
42. Friedman Z, Katznelson R, Devito I, Siddiqui M, Chan V. sues. New York: Guildford Press; 2012. p. 99e144
Objective assessment of manual skills and proficiency in 60. Ackerman P. Determinants of individual differences dur-
performing epidural anesthesiadvideo-assisted valida- ing skill acquisition: cognitive abilities and information
tion. Reg Anesth Pain Med 2006; 31: 304e10 processing. J Exp Psychol Gen 1988; 117: 288e318
43. Hayter M, Friedman Z, Bould M, et al. Validation of the 61. Enochsson L, Westman B, Ritter E, et al. Objective
imperial college surgical assessment device (ICSAD) for assessment of visuospatial and psychomotor ability and
labour epidural placement. Can J Anaesth 2009; 56: 419e26 flow of residents and senior endoscopists in simulated
44. Sivaprakasam J, Purva M. CUSUM analysis to assess gastroscopy. Surg Endosc 2006; 20: 895e9
competence: what failure rate is acceptable? Clin Teach 62. Luursema J-M, Buzink S, Verwey W, Jakimowicz JJ. Visuo-
2010; 7: 257e61 spatial ability in colonoscopy simulator training. Adv
45. Breen D, Bogar L, Heigl P, Rittberger J, Shorten G. Valida- Health Sci Educ Theory Pract 2010; 15: 685e94
tion of a clinical assessment tool for spinal anaesthesia. 63. Brandt M, Davies E. Visualespatial ability, learning mo-
Acta Anaesthesiol Scand 2011; 55: 653e7 dality and surgical knot tying. Can J Surg 2006; 49: 412e6
46. Drake E, Coghill J, Sneyd J. Defining competence in ob- 64. Gallagher A, Cowie R, Crothers I, Jordan-Black J, Satava R.
stetric epidural anaesthesia for inexperienced trainees. Br PicSOr: an objective test of perceptual skill that predicts
J Anaesth 2015; 114: 951e7 laparoscopic technical skill in three initial studies of
47. Schuepfer G, Jo € hr M. Generating a learning curve for laparoscopic performance. Surg Endosc 2003; 17: 1468e71
penile block in neonates, infants and children: an empir- 65. Hedman L, Strom P, Andersson P, Kjellin A, Wredmark T,
ical evaluation of technical skills in novice and experi- Fellander-Tsai L. High-level visualespatial ability for
enced anaesthetists. Paediatr Anesth 2004; 14: 574e8 novices correlates with performance in a visualespatial
Assessment tools for regional anaesthesia - 273

complex surgical simulator task. Surg Endosc 2006; 20: 75. Norris A, McCahon R. Cumulative sum (CUSUM) assess-
1275e80 ment and medical education: a square peg in a round
66. Van Herzeele I, O’Donoghue K, Aggarwal R, Vermassen F, hole. Anaesthesia 2011; 66: 250e4
Darzi A, Cheshire N. Visuospatial and psychomotor apti- 76. Ebert T, Fox C. Competency-based education in anesthe-
tude predicts endovascular performance of inexperienced siology: history and challenges. Anesthesiology 2014; 120:
individuals on a virtual reality simulator. J Vasc Surg 2010; 24e31
51: 1035e42 77. Veloski J, Boex JR, Grasberger MJ, Evans A, Wolfson DB.
67. Sweeney K, Hayes J, Chiavaroli N. Does spatial ability help Systematic review of the literature on assessment, feed-
the learning of anatomy in a biomedical science course? back and physicians’ clinical performance: BEME guide
Anat Sci Educ 2014; 7: 289e94 no. 7. Med Teach 2006; 28: 117e28
68. Gettman M, Kondraske G, Traxer O, et al. Assessment of 78. Wood T. Assessment not only drives learning, it may also
basic human performance resources predicts operative help learning. Med Educ 2009; 43: 5e6
performance of laparoscopic surgery. J Am Coll Surg 2003; 79. Aggarwal R, Grantcharov T, Darzi A. Framework for sys-
197: 489e96 tematic training and assessment of technical skills. J Am
69. Dashfield A, Smith J. Correlating fibreoptic nasotracheal Coll Surg 2007; 204: 697e705
endoscopy performance and psychomotor aptitude. Br J 80. Slater R, Castanelli D, Barrington M. Learning and teach-
Anaesth 1998; 81: 687e91 ing motor skills in regional anesthesia: a different
70. Morgan P, Cleave-Hogg D, Guest C. A comparison of global perspective. Reg Anesth Pain Med 2014; 39: 230e9
ratings and checklist scores from an undergraduate 81. Gallagher A, Ritter E, Champion H, et al. Virtual reality
assessment using an anesthesia simulator. Acad Med 2001; simulation for the operating room: proficiency-based
76: 1053e5 training as a paradigm shift in surgical skills training.
71. Cunnington J, Neville A, Norman G. The risks of thor- Ann Surg 2005; 241: 364e72
oughness: reliability and validity of global ratings and 82. Fletcher G, McGeorge P, Flin R, Glavin R, Maran N. The role
checklists in an OSCE. Adv Health Sci Educ Theory Pract of non-technical skills in anaesthesia: a review of current
1996; 1: 227e33 literature. Br J Anaesth 2002; 88: 418e29
72. Regehr G, MacRae H, Reznick R, Szalay D. Comparing the 83. Smith A, Pope C, Goodwin D, Mort M. What defines
psychometric properties of checklists and global rating expertise in regional anaesthesia? An observational
scales for assessing performance on an OSCE-format ex- analysis of practice. Br J Anaesth 2006; 97: 401e7
amination. Acad Med 1998; 73: 993e7 84. Nix C, Margarido C, Awad I, et al. A scoping review of the
73. Ilgen J, Ma I, Hatala R, Cook D. A systematic review of val- evidence for teaching ultrasound-guided regional anes-
idity evidence for checklists versus global rating scales in thesia. Reg Anesth Pain Med 2013; 38: 471e80
simulation-based assessment. Med Educ 2015; 49: 161e73 85. Neal J. Education in regional anesthesia: caseloads,
74. Clark R, Pugh C, Yates K, Inaba K, Green D, Sullivan M. The simulation, journals, and politics: 2011 Carl Koller lecture.
use of cognitive task analysis to improve instructional Reg Anesth Pain Med 2012; 37: 647e51
descriptions of procedures. J Surg Res 2012; 173: e37e42

Handling editor: J.G. Hardman

You might also like