You are on page 1of 8

Journal of Medical Imaging and Radiation Oncology 66 (2022) 225–232

MEDICAL IMAGING—REVIEW ARTICLE

Quality use of artificial intelligence in medical imaging: What do


radiologists need to know?

Journal of Medical Imaging and Radiation Oncology


Stacy K Goergen,1,2 Helen ML Frazer3,4 and Sandeep Reddy5
1 Monash Imaging, Monash Health, Melbourne, Victoria, Australia
2 Department of Imaging, School of Clinical Sciences, Monash University, Melbourne, Victoria, Australia
3 St Vincent’s BreastScreen, St Vincent’s Hospital Melbourne, Melbourne, Victoria, Australia
4 BreastScreen Victoria, Melbourne, Victoria, Australia
5 School of Medicine, Deakin University, Geelong, Victoria, Australia

SK Goergen MBBS, MClinEpi, FRANZCR; Abstract


HML Frazer MBBS, FRANZCR, M.Epi;
The application of artificial intelligence, and in particular machine learning, to
S Reddy MBBS, MSc, PhD.
the practice of radiology, is already impacting the quality of imaging care. It
Correspondence will increasingly do so in the future. Radiologists need to be aware of factors
Dr Stacy K Goergen, Monash Imaging, Monash that govern the quality of these tools at the development, regulatory and clini-
Health, 246 Clayton Rd, Clayton, Vic. 3168, cal implementation stages in order to make judicious decisions about their
Australia. use in daily practice.
Email: stacy.goergen@monashhealth.org
Key words: artificial intelligence; medical imaging.

Submitted 29 November 2021; accepted 14


December 2021.

doi:10.1111/1754-9485.13379

communication about, medical imaging studies will be at


Introduction least as impactful to the field and to patient care as was
Quality in health care can be defined as consistent provi- the introduction of cross-sectional imaging in the previ-
sion of safe, effective, timely, patient-centred and appro- ous century. The opportunity to transform the capacity
priate care.1 The recent exponential growth of artificial and accuracy of image interpretation enabling radiolo-
intelligence (AI) and machine learning (ML) applications gists to spend more time in examining complex cases,
for medical imaging has the potential to profoundly personalising patient care and conducting research is
impact our ability as clinicians to provide quality care. now envisioned. Topol’s concept of ‘Deep Medicine’ is
Development of AI / ML tools for medical imaging that humans and AI can work synergistically, in contrast
interpretation is rapidly accelerating due to increasingly to ‘Shallow Medicine’ characterised by brief clinical
large digitised image data sets, availability of open- encounters, poor evidence support and inaccurate tools.
source algorithms, increased computing power and cloud ‘Deep Medicine’ enabled by AI tools can promote better
services alongside developments in deep-learning tech- care, better evidence and greater accuracy.3 Such appli-
niques like convolutional neural networks (CNNs), which cations have also been suggested as remedies for radiol-
analyse ‘non-linear’ relationships between input data to ogist burnout that, somewhat paradoxically, has
draw conclusions about outputs. The demonstration of developed in parallel with the transition from paper and
the potential for CNNs, the prototype deep-learning film to digital, bringing with it a less relenting cognitive
application for analysis of image data, for analysis of load.4
high-resolution images is as recent as the ImageNet Automation of tasks that radiologists probably perform
Challenge of 2012. In this competition, Russakovsky suboptimally, such as rib fracture and lung nodule detec-
et al.2 utilised a form of CNN termed ‘AlexNet’ that signif- tion with CT,5,6 reassessment of the size of a pleural effu-
icantly advanced the ability of AI to detect and interpret sion on sequential chest radiographs, or mammographic
images. screening7 seems like an attractive way to free the radi-
It is highly likely that the application of AI to the acqui- ologist to perform higher level interpretive tasks that
sition, reconstruction, triage, presentation, interpretation may not be amenable to automation and conduct activi-
and reporting of, as well as post-interpretative ties like multidisciplinary team meetings. Triaging with AI

© 2022 Royal Australian and New Zealand College of Radiologists. 225


SK Goergen et al.

of the most urgent cases for reporting, such as CT scans 1 Design and reporting of studies of AI / ML applications
harbouring pulmonary embolism,8 chest radiographs in imaging—standardisation to improve quality
demonstrating pneumothorax9 or head CTs containing 2 Regulation of ML software applications as medical
haemorrhage10 could reduce patient morbidity and devices—what is needed to support quality care?
length of stay in emergency departments. Synthetic MRI, 3 Clinical evaluation and implementation of AI / ML
which utilises deep-learning systems to post-process and algorithms to measure and improve quality
reconstruct MR image data such that image acquisition
time is reduced without noticeable degradation of image
quality, may improve efficiency, reduce costs and
Design and reporting of AI / ML studies:
improve access.11
Standardisation to improve quality
However, the rapidly increasing development of AI All tools used to support clinical decision-making require
applications for medical imaging, their commercialisa- validation. The standard has been reporting of ran-
tion, and subsequent approval by regulators for clinical domised clinical trials. The SPIRIT (Standard Protocol
use, have come with concerns regarding the actual per- Items: Recommendations for Interventional Trials) and
formance of these tools in day-to-day clinical practice CONSORT (Consolidated Standards of Reporting Trials)
(12–15). Furthermore, bias within training and / or valida- statements are reporting guidelines for protocols of and
tion data sets and lack of uniformity of reporting stan- completed randomised controlled trials (RCTs) respec-
dards in scientific publications relating to algorithm tively.16,17 Recently, these consensus-based statements
performance, as well as absence of postmarket surveil- have been extended as SPIRIT-AI18 and CONSORT-AI19
lance or transparent processes to define, monitor and statements due to recognition that existing appraisal
remediate performance that falls below that which led to tools for evaluation of the quality of randomised RCTs do
regulatory approval in the first place, are significant not adequately address key issues relating to AI / ML
quality issues at the heart of our work as clinicians. Lack interventions. Draft item lists were reviewed and modi-
of sufficient validation of many commercially available fied by stakeholders in 2 online Delphi surveys, where
algorithms is a direct result of lack of access of AI devel- they were refined and modified. The 98 stakeholders
opers to enough appropriate quality labelled clinical that developed the Statements included clinicians, com-
image data. puter science/ML experts, methodologists, statisticians,
Radiologists largely take for granted that our imaging funders, regulators, journal editors, patients and policy-
equipment will perform within the specifications provided makers.
by the vendor, and that automatic checks on things such The final SPIRIT-AI and CONSORT-AI statements were
as gradient and main field homogeneity, CT exposures, published in Nature Medicine, The British Medical Journal
focal spot size and CT detector performance will occur. and The Lancet Digital Health in September 2020.17–19
However, these quality assurance features are not a rou- There are 26 candidate items for SPIRIT-AI and 29 for
tine part of ML applications for interpretative or non- CONSORT-AI including description of the inputs and out-
interpretative tasks, and the ability of such applications puts of the algorithm, how the algorithm makes recom-
to continuously modify their performance, or ‘learn’, over mendations and fits into a clinical pathway,
time poses both regulatory and implementation chal- considerations for continuously improving algorithms,
lenges. Put simply, how do radiologists know that the AI sources of bias and poor generalisability and statements
software application they implemented to replace or aug- on algorithm ownership and potential / actual conflicts of
ment an interpretative or other task is still doing what it interest. Journals that publish scientific papers relating
is supposed to be doing? to ML / AI algorithms are being asked by the SPIRIT-AI
Measuring, maintaining and improving quality in the AI and CONSORT-AI initiative to formally endorse the state-
era pose new challenges at both the regulatory and clini- ments and use them in their peer-review processes in
cal level. There is a clear need to require documentation order to standardise review criteria applied by journals,
of intended use and performance of AI tools and need for and thus, the ‘burden of proof’ before manuscripts are
a quality assurance programme that enables audit and accepted and published. Tools for assessment of the risk
testing to detect drift and bias in high stakes health deci- of bias (PROBAST-AI) and for the reporting of AI diag-
sions. While the solutions to these challenges are neither nostic and risk prediction studies (TRIPOD-AI) are under-
simple nor necessarily obvious, an awareness of the way.20 These too will support greater transparency and
issues themselves is essential. Radiologists remain ulti- more uniform peer-review and critical appraisal pro-
mately responsible for the systems we put in place to cesses and thus indirectly improve the quality of pub-
deliver care to our patients and community; as such, we lished work.
need to be educated users of the technological advances Of particular relevance to diagnostic radiologists is the
that are a rapidly changing fact of life in our specialty. work to develop STARD-AI (Standards for Reporting of
The remainder of this article will focus on 3 main Diagnostic Accuracy Studies—Artificial Intelligence), an
themes relating to quality use of AI / ML applications in extension of the original STARD Statement, which was
medical imaging: updated in 2016. Sounderajah21 et al. state, in their

226 © 2022 Royal Australian and New Zealand College of Radiologists.


Quality use of artificial intelligence in radiology

paper, describing the STARD-AI methodology: ‘. . .. . .much fact, without intervention, their performance normally
of the evidence supporting diagnostic algorithms has deteriorates over time, due to ‘concept drift’ arising from
been disseminated in the absence of AI-specific reporting changes to the statistical input to the algorithm that, in
guidelines’. Varying definitions of the term ‘validation’ turn, is due to changes in things like patient population,
and unfamiliar outcome measures (e.g. Jaccard similarity scanner platform, scanning protocols or contrast dosage,
coefficient and F-score) limit the ability of clinicians to to name a few. Such ‘drift’ can potentially be ameliorated
translate AI tools into better health outcomes, because by continuous learning AI that continues to improve as a
they do not necessarily understand or trust their perfor- consequence of changing environments and data. How-
mance’. ever, this requires a specific ‘feedback loop’ to exist from a
To date, there are only 10 recorded RCTs of AI models reference standard or ‘ground truth’, whether this be a
in health care and only two published (one in gastroen- diagnosis (e.g. a breast cancer on a mammogram) or a
terology, one in ophthalmology) with the largest sample particular outcome (e.g. generation of a diagnostic quality
size being 1,000 participants.22 A recent systematic image following application of a denoising algorithm). In
review of one of the more advanced areas of AI develop- addition, this brings with it the potential for ‘catastrophic
ments in image interpretation, the classification of mam- forgetting’ or ‘catastrophic inference’, whereby model per-
mograms, concluded that no study had adequately formance deteriorates abruptly following retraining of a
validated its use in screening programmes.15 model with new data.25 Hence, an essential component of
any clinically implemented ‘continuous learning’ model is
a feedback loop from accurately labelled and appropriate
Regulation and regulatory frameworks: data sets of sufficient size and composition and a clear
What is needed to support quality care? operational and measurable definition of what it is that the
Regulators have had to consider how to adapt their eval- model is supposed to be doing.
uation frameworks to be fit for purpose for medical While the important details of regulatory frameworks
devices that are based on AI / ML, in the face of rapidly specific to AI in medical imaging continue to evolve
growing demand for regulatory approval of commer- internationally, albeit more slowly than the applications
cialised products. Last month, the U.S. Food and Drug for device approval, Gilbert et al.26 in a review of the
Administration (FDA) along with Health, Canada, U.K.’s current state of these frameworks highlight the primary
Medicines and Healthcare products Regulatory Agency need for international harmonisation of regulations that
(MHRA) released a list of 10 guiding principles to inform govern transparency of AI quality management systems
development of regulatory frameworks for AI / ML appli- and algorithm change protocols.
cations (Fig. 1).23 This was one of the intended outputs This raises the issue of the regulatory challenges of
of the FDAs Software as a Medical Device (SaMD) Action approving of a medical device whose performance
Plan, announced in January 2021.24 changes; specifically, is a re-approval process required,
Standardisation of reporting of the key attributes and and if so, what constitutes a significant change to the
performance of a product, as described in the previous device when this is a continuous learning AI algorithm?
paragraph, could be extremely helpful in this regard. Apart Is it practical for this to occur or does it represent a
from the various attributes and development methods of major impediment to dissemination of AI applications in
the application that exist at the time of regulatory consid- clinical practice? The recently released FDA Action Plan
eration, is the important concept of ‘continuous learning’. provides an important modification to their decades—
Unlike a hip prosthesis or contrast injection pump, the long principle that medical device should be defined,
performance of algorithms may not be ‘static’ following thoroughly tested and meticulously documented before
their approval for clinical use, depending upon their approval, and their subsequent safety, efficacy and
design and what is stipulated in the approval process. In patient benefit should be continuously monitored by both

Fig. 1. FDA / MHRA / Health Canada.

© 2022 Royal Australian and New Zealand College of Radiologists. 227


SK Goergen et al.

clinical (postmarket) surveillance and tight regulatory radiology systems and standards (such as \ Digital Imag-
controls about any fundamental changes to the device ing and Communications in Medicine [DICOM], Health
that are made following initial regulatory approval. The Level-7 [HL7] and Fast Healthcare Interoperability
Action Plan poses a solution to the challenges to continu- Resources-1 [FHIR]), is trust. While post hoc explanation
ous learning of AI SaMD, in particular, by defining, at the of what is in the ‘black box’ of AI has previously been
time of original regulatory approval, some limits regard- proposed as the best way to achieve this, using tech-
ing the expected change in the clinical behaviour of niques such as heat or saliency maps for a radiologist to
adaptive ML-based SaMD along with methods to continu- ‘understand’ how a model reaches a decision in an indi-
ously monitor the extent of change, and to determine its vidual case can be unrealistic in practice32,33 (Fig. 2).
conformity with these a priori ‘set’ boundaries. This Indeed, the term ‘explainability’ itself is variously
equates to what Gilbert et al. have defined as ‘premarket defined and used in the context of AI and ‘explainable AI’
comprehensive consideration’ of the changes that would (XAI) is itself a growing field of research. Explainability
be acceptable for the device. The Australian Therapeutic can be the best thought of as a combination of various
Goods Administration (TGA) made changes to the regu- techniques to make the decision-making process of algo-
lation of SaMD in February 202127 and a further guidance rithmic models understandable and closer to trans-
document relating to regulation of SaMD (which is still in parency. This concept encompasses comprehensibility,
draft form) in October 2021,28 but this guidance does interpretability, explainability, causability and under-
not provide specific detail relating to adaptive or ‘contin- standability.
uous learning’ AI. It remains to be seen whether it will The well-known example of the ML algorithm that was
follow FDA recommendations for such devices or develop highly accurate for pneumothorax detection on chest
more prescriptive requirements; the latter decision will radiographs, until the chest tube was masked9 and its
have a significant influence on the development and performance plummeted, is not reproducible in its simple
implementation of AI tools in medical imaging in Aus- ‘understandability’ across the increasingly broad range of
tralia and New Zealand. In addition, it is self-evident that AI applications in imaging. In addition, the human ten-
there will be a room for interpretation regarding whether dency to try to determine whether the model used the
a change to the performance of a software product as a same information that we would have, as radiologists, to
result of continuous learning is consistent, or not, with reach a diagnosis can actually confound our ability to
the ‘premarket comprehensive consideration’; this inter- comprehend and understand how a model is reaching a
pretation has clear cost implications for AI / ML SaMD conclusion or prediction, and to even identify when a
developers and safety implications for patients. While model makes clinically important errors.32 Reddy et al.,
the concept itself is simple and logical, its implementa- Kelly et al. and Amman et al. (34–36) have suggested that
tion may not be. explainability may be the first casualty of deep-learning
The importance of appropriately qualified individuals algorithm high performance. The balance between ‘ex-
from multiple disciplines being involved in creating the plainability’ and accuracy remains controversial, and
new frameworks within which regulation of medical AI will medical applications of AI / ML are perhaps the most dif-
be delivered cannot be overstated. Ethical considerations ficult domain in which this controversy will play out in
relating to bias in algorithm development and deployment the near future.
as well as privacy and data security, especially when data Larson et al.37 and Ghassemi et al. advocate that
sharing agreements are entered into to facilitate continu- ‘meticulous’ and extensive validation and ongoing auto-
ous learning of AI applications, need urgent integration mated monitoring of the performance of AI systems are
into regulatory frameworks.29 The Royal Australian and essential components of any AI implementation, particu-
New Zealand College of Radiologists released its position larly where explainability may not be achievable. In par-
statement ‘Ethical Principles for Artificial Intelligence in ticular, Larson et al. point out that the guard rails we
Medicine’ in August 2019 setting out nine principles to place around the performance of other medical devices
guide the development of professional and practice stan- are often absent in medical AI applications. Furthermore,
dards regarding the research and deployment of AI and, of the eight basic elements of a quality control process,
subsequently, its Standards of Practice for Artificial Intelli- performance measures, targets, monitoring and feed-
gence in September 2020, to lead the profession and back are the most important; prediction and optimisation
inform regulators, patients, consumers and AI developers models (akin to Google maps and satellite navigation,
/ vendors about quality use of AI in medical imaging.30,31 respectively) are the least critical of these but are the
elements of process control where AI performs best.
Indeed, the parameters within which AI applications are
Clinical evaluation and implementation supposed to perform are currently not defined nor read-
of AI applications: Measurement and ily monitored for many commercial applications except
improvement of quality by relatively laborious, manual processes that involve
One of the key enablers to clinical implementation, along comparison with actual reader interpretations. This is
with interoperability of AI algorithms with current costly. In a recent description of implementation of AI

228 © 2022 Royal Australian and New Zealand College of Radiologists.


Quality use of artificial intelligence in radiology

Fig. 2. Saliency maps. Visualisation of saliency maps generated for three breast cancer examples (from Ref. 33).

software to detect pneumothorax,38 the first step was to examination or use of standardised reporting templates39
assess the performance of the software using local chest or the reduction in radiation dose and artefacts.
radiographs. This involved identification of chest radio- When considering the purchase and / or implementa-
graphs with large and small pneumothoraces, and then, tion of an AI application, Filice et al.40 suggest that the
two subspecialist thoracic radiologists reading 150 cases first questions should be ‘Is there a problem that needs
mixed with chest radiographs demonstrating no pneu- fixing?’ and ‘does the AI tool deliver value?’ that is
mothorax. This process needed regular repetition in improved quality and / or reduced cost. Many commer-
order to monitor the application for drift in performance. cially available applications are designed to do just one
The importance of transparent performance metrics, thing, such as identifying a pneumothorax or pulmonary
pre-implementation validation on large, diverse data sets embolism. If there is no evidence that the task is cur-
and ongoing monitoring of performance is no less for AI rently being done in a manner detrimental to timely and
applications performing non-interpretive tasks. These / or safe patient care, it may not be cost-effective to del-
include natural language processing to determine compli- egate this piece of the overall interpretation task to AI.
ance of referrals with required descriptors for the Consideration of the ‘value proposition’ is just as

© 2022 Royal Australian and New Zealand College of Radiologists. 229


SK Goergen et al.

important for AI tools that perform non-interpretive clinical implementation performance assessment are to
tasks intended to improve quality.41,42 be potentiated. The Secure Research Platform (SeRP) at
Current AI applications generally do not have any ‘built Monash University (43) and the ACR AI Lab Federated
in’ tools to assess the value that they deliver to their cus- Learning initiative, commencing 2022, are examples of
tomers. Furthermore, many customers do not have the strategies to address this problem in different ways. The
informatics infrastructure to determine the added finan- former allows sensitive health data to be used and
cial value, or costs, of AI algorithms. For example, if an shared within a secure environment, and the latter
AI tool designed to identify cerebral haemorrhage on a enables multiple actors to build a common, robust
CT scan, and then prioritise the study on a worklist, can machine learning model without sharing data, thus
only identify haemorrhages large enough to be recog- addressing critical data sovereignty issues such as pri-
nised and flagged by the CT technologist, the cost of pur- vacy, security and access rights.
chase and ongoing performance monitoring of such
applications may not be outweighed by the benefits. This
will obviously depend on the frequency of occurrence of
Conclusion
cerebral haemorrhage and the efficiency of existing com- Apart from the quality-related issues described here in
munication between technologist and radiologist, the lat- relation to AI / ML applications at the development, reg-
ter potentially facilitated by inexpensive messaging ulatory and clinical implementation stages, there remain
applications. However, in resource-constrained settings a range of ethical, legal and social dilemmas to solve if
where there is no radiological interpretation available, AI, particularly when this includes autonomous decision-
such tools may play a very important role in improving making and continuous learning, is to be widely and suc-
health equity, by expediting recognition of conditions cessfully implemented in clinical settings.42 Radiologists
requiring emergency treatment or patient transfer. Auto- are at the forefront of AI application to medical practice
mated mechanisms that allow ongoing evaluation of tool and as such have a special responsibility to understand
performance and value delivery need to become an inte- and judiciously apply these new tools protect and
gral part of AI applications in medical imaging; develop- enhance the quality of care we provide to our patients
ers should recognise incorporation of these into their and community.
offerings as a potential competitive advantage in the
marketplace.
Data availability statement
Data sharing arrangements improve the quality of AI
tools in medical imaging through training and validation Data sharing is not applicable to this article as no new
on a wide range of data, whether this be images, text or data were created or analyzed in this study.
numeric output from imaging equipment. This can, but
does not always, reduce bias and unintentional widening
References
of health disparities based on design of algorithms using
data unrepresentative of the population to which it will 1. Institute of Medicine. Crossing the Quality Chasm: A
be applied. However, access to high quality, curated and New Health System for the 21st Century, The National
accurately labelled data has now become more valuable Academies Press, Washington, DC; 2001. https://doi.
than the algorithms themselves. Along with the need to org/10.17226/10027.
confirm purported performance of AI software prior to 2. Russakovsky O, Deng J, Su H, et al. Imagenet large
implementation, and then plan and resource an ongoing scale visual recognition challenge. Int J Comput Vis
quality monitoring programme, this issue represents the 2015; 115: 211–52.
3. Topol E. Deep Medicine: How Artificial Intelligence Can
greatest impediment to more widespread integration of
Make Healthcare Human Again. Hachette UK, London,
AI tools into medical imaging workflow.
2019.
Health services may be approached by AI developers
4. Dhanoa D, Dhesi TS, Burton KR, Nicolaou S, Liang T.
who need access to their patients’ imaging data in order
The evolving role of the radiologist: The Vancouver
to train and validate algorithms. However, health ser-
workload utilization evaluation study. J Am Coll Radiol
vices are often less sophisticated than universities in
2013; 10: 764–9.
negotiating appropriate fair value, royalty and intellectual 5. Aissa J, Schaarschmidt BM, Below J, et al. Performance
property arrangements with developers. This is to the and clinical impact of machine learning based lung
detriment of the health service itself, patients, radiolo- nodule detection using vessel suppression in melanoma
gists (who are de facto providing ground truth and patients. Clin Imaging 2018; 52: 328–33.
labelled data through their reports), taxpayers and the 6. Zhou QQ, Wang J, Tang W, et al. Automatic detection
health system as a whole. and classification of rib fractures on thoracic CT using
Innovative approaches to the sharing and / or use of convolutional neural network: Accuracy and feasibility.
sensitive health data, including identified images, while Korean J Radiol 2020; 21: 869–79.
protecting patient privacy and data security are urgently  S, Lopez J, Chone P, Bertinotti T, Grouin JM, Fillard
7. Pacile
needed if AI algorithm development, validation, and pre- P. Improving breast cancer detection accuracy of

230 © 2022 Royal Australian and New Zealand College of Radiologists.


Quality use of artificial intelligence in radiology

mammography with the concurrent use of an artificial studies. BMJ 2020; 368: m689. https://doi.org/10.
intelligence tool. Published November 4, 2020. Radiol 1136/bmj.m689.
Artif Intell 2020; 2: e190208. 23. [Cited 22 Nov 2021]. Available from URL: https://www.
8. Soffer S, Klang E, Shimon O, et al. Deep learning for fda.gov/medical-devices/software-medical-device-
pulmonary embolism detection on computed samd/good-machine-learning-practice-medical-device-
tomography pulmonary angiogram: A systematic review development-guiding-principles?utm_medium=
and meta-analysis. Sci Rep 2021; 11: 15814. email&utm_source=govdelivery.
9. Rajpurkar P, Irvin J, Zhu K, et al. CheXNet: radiologist- 24. [Cited 24 Nov 2021]. Available from URL: (https://
level pneumonia detection on chest X-rays with deep www.fda.gov/medical-devices/software-medical-device-
learning. ArXiv, abs/1711.05225. 2017. samd/artificial-intelligence-and-machine-learning-
10. Lee JY, Kim JS, Kim TY, et al. Detection and software-medical-device.
classification of intracranial haemorrhage on CT 25. Pianykh OS, Langs G, Dewey M, et al. Continuous
images using a novel deep-learning algorithm. Sci Rep learning AI in radiology: Implementation principles and
2020; 10: 20546. early applications. Radiology 2020; 297: 6–14.
11. Fayad LM, Parekh VS, de Castro Luna R, et al. A deep 26. Gilbert S, Fenech M, Hirsch M, Upadhyay S, Biasiucci A,
learning system for synthetic knee magnetic resonance Starlinger J. Algorithm change protocols in the
imaging. Invest Radiol 2021; 56: 357–68. regulation of adaptive machine learning-based medical
12. Kong SH, Shin CS. Applications of machine learning in devices. J Med Internet Res 2021; 23: e30545. https://
bone and mineral research. Endocrinol Metab 2021; 36: doi.org/10.2196/30545.
928–37. 27. [Cited 24 Nov 2021]. Available from URL: https://www.
13. Oliveira E, Carmo L, van den Merkhof A, Olczak J, et al. tga.gov.au/resource/regulatory-changes-software-
Machine learning consortium. An increasing number of based-medical-devices.
convolutional neural networks for fracture recognition 28. Available from URL: https://www.tga.gov.au/regulation-
and classification in orthopaedics: Are these externally software-based-medical-devices.
validated and ready for clinical application? Bone Jt 29. Reddy S, Allan S, Coghlan S, Cooper P. A governance
Open. 2021; 2: 879–85. model for the application of AI in health care. J Am Med
14. Wynants L, Van Calster B, Collins GS, et al. Update to Inform Assoc 2020; 27: 491–7.
living systematic review on prediction models for 30. [Cited 19 Nov 2021]. Available from URL: https://www.
diagnosis and prognosis of covid-19. BMJ 2021; 372: ranzcr.com/college/document-library/ethical-principles-
n236. https://doi.org/10.1136/bmj.n236. for-ai-in-medicine.
15. Freeman K, Geppert J, Stinton C, et al. Use of artificial 31. [Cited 19 Nov 2021]. Available from URL: https://www.
intelligence for image analysis in breast cancer ranzcr.com/college/document-library/standards-of-
screening programmes: Systematic review of test practice-for-artificial-intelligence.
accuracy. BMJ 2021; 374: n1872. . 32. Ghassemi M, Oakden-Rayner L, Beam AL. The false
16. SPIRIT Statement. [Cited 24 Nov 2021]. Available from hope of current approaches to explainable artificial
URL: https://www.spirit-statement.org. intelligence in health care. Lancet Digit Health 2021; 3:
17. CONSORT Statement. [Cited 24 Nov 2021]. Available e745–50.
from URL: http://www.consort-statement.org. 33. Frazer HM, Qin AK, Pan H, Brotchie P. Evaluation of deep
18. Cruz Rivera S, Liu X, Chan AW, et al. Guidelines for learning-based artificial intelligence techniques for
clinical trial protocols for interventions involving artificial breast cancer detection on mammograms: Results from
intelligence: The SPIRIT-AI extension. Nat Med 2020; a retrospective study using a BreastScreen Victoria
26: 1351–63. dataset. J Med Imaging Radiat Oncol 2021; 65: 529–
19. Liu X, Rivera SC, Moher D, Calvert MJ, Denniston AK. 37.
Reporting guidelines for clinical trial reports for 34. Reddy S, Fox J, Purohit MP. Artificial intelligence-
interventions involving artificial intelligence: The enabled healthcare delivery. J R Soc Med 2019; 112:
CONSORT-AI extension. BMJ 2020; 370: m3164. 22–8.
20. Collins GS, Dhiman P, Andaur Navarro CL, et al. 35. Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G,
Protocol for development of a reporting guideline King D. Key challenges for delivering clinical impact with
(TRIPOD-AI) and risk of bias tool (PROBAST-AI) for artificial intelligence. BMC Med 2019; 17: 195.
diagnostic and prognostic prediction model studies 36. Amann J, Blasimme A, Vayena E, Frey D, Madai VI,
based on artificial intelligence. BMJ Open 2021; 11: Precise QC. Explainability for artificial intelligence in
e048008. healthcare: a multidisciplinary perspective. BMC Med
21. Sounderajah V, Ashrafian H, Golub RM, et al. Inform Decis Mak 2020; 20: 310.
Developing a reporting guideline for artificial 37. Larson D, Boland G. Imaging quality control in the era
intelligence-centred diagnostic test accuracy studies: of artificial intelligence. J Am Coll Radiol 2019; 16:
The STARD-AI protocol. BMJ Open 2021; 11: e047709. 1259–66.
22. Nagedran M, Chen Y, Lovejoy CA, et al. Artificial 38. Pierce JD, Gupta A, Rosipko B, et al. Seamless
intelligence versus clinicians: Systematic review of integration of artificial intelligence into the clinical
design, reporting standards, and claims of deep learning environment: our experience with a novel

© 2022 Royal Australian and New Zealand College of Radiologists. 231


SK Goergen et al.

pneumothorax detection AIalgorithm. J Am Coll Radiol 41. Bhatia N, Trivedi H, Safdar N, Heilbrun ME. Artificial
2021; 18: 1497–505. intelligence in quality improvement: reviewing uses
39. Guimaraes CV, Grzeszczuk R, Bisset GS, et al. of artificial intelligence in noninterpretative
Comparison between manual auditing and a natural processes from clinical decision support to
language process with machine learning algorithm to education and feedback. J Am Coll Radiol 2020;
evaluate faculty use of standardized reports in 17: 1382–7.
radiology. J Am Coll Radiol 2018; 15: 550–3. 42. Carter SM, Rogers W, Win KT, Frazer H, Richards B,
40. Filice RW, Mongan J, Kohli MD. Evaluating artificial Houssami N. The ethical, legal and social implications of
intelligence systems to guide purchasing decisions. J using artificial intelligence systems in breast cancer
Am Coll Radiol 2020; 17: 1405–9. care. Breast 2020; 49: 25–32.

232 © 2022 Royal Australian and New Zealand College of Radiologists.

You might also like