You are on page 1of 7

Articles

Effect of artificial intelligence-based triaging of breast cancer


screening mammograms on cancer detection and radiologist
workload: a retrospective simulation study
Karin Dembrower, Erik Wåhlin, Yue Liu, Mattie Salim, Kevin Smith, Peter Lindholm, Martin Eklund, Fredrik Strand

Summary
Background We examined the potential change in cancer detection when using an artificial intelligence (AI) Lancet Digital Health 2020;
cancer-detection software to triage certain screening examinations into a no radiologist work stream, and then after 2: e468–74

regular radiologist assessment of the remainder, triage certain screening examinations into an enhanced assessment Department of Physiology
and Pharmacology
work stream. The purpose of enhanced assessment was to simulate selection of women for more sensitive screening (K Dembrower MD,
promoting early detection of cancers that would otherwise be diagnosed as interval cancers or as next-round P Lindholm PhD), Department
screen-detected cancers. The aim of the study was to examine how AI could reduce radiologist workload and increase of Pathology and Oncology
cancer detection. (M Salim MD, F Strand PhD),
and Department of Medical
Epidemiology and Biostatistics
Methods In this retrospective simulation study, all women diagnosed with breast cancer who attended two consecutive (M Eklund PhD), Karolinska
screening rounds were included. Healthy women were randomly sampled from the same cohort; their observations Institute, Stockholm, Sweden;
were given elevated weight to mimic a frequency of 0·7% incident cancer per screening interval. Based on the Department of Radiology,
Capio Sankt Görans Hospital,
prediction score from a commercially available AI cancer detector, various cutoff points for the decision to channel Stockholm, Sweden
women to the two new work streams were examined in terms of missed and additionally detected cancer. (K Dembrower); Department
of Medical Radiation Physics
and Nuclear Medicine
Findings 7364 women were included in the study sample: 547 were diagnosed with breast cancer and 6817 were healthy
(E Wåhlin MSc), Department
controls. When including 60%, 70%, or 80% of women with the lowest AI scores in the no radiologist stream, the of Radiology (M Salim), and
proportion of screen-detected cancers that would have been missed were 0, 0·3% (95% CI 0·0–4·3), or 2·6% (1·1–5·4), Breast Radiology (F Strand),
respectively. When including 1% or 5% of women with the highest AI scores in the enhanced assessment stream, the Karolinska University Hospital,
Stockholm, Sweden;
potential additional cancer detection was 24 (12%) or 53 (27%) of 200 subsequent interval cancers, respectively, and
and Department of
48 (14%) or 121 (35%) of 347 next-round screen-detected cancers, respectively. Computational Science and
Technology, KTH Royal
Interpretation Using a commercial AI cancer detector to triage mammograms into no radiologist assessment and Institute of Technology and
Science for Life Laboratory,
enhanced assessment could potentially reduce radiologist workload by more than half, and pre-emptively detect
Stockholm, Sweden (Y Liu MSc,
a substantial proportion of cancers otherwise diagnosed later. K Smith PhD)
Correspondence to:
Funding Stockholm City Council. Dr Karin Dembrower, Capio
Sankt Görans Hospital,
112 81 Stockholm, Sweden
Copyright © 2020 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0
karin.dembrower@ki.se
license.

Introduction cancers could have been detected on previous screening


Population-wide breast cancer screening programmes if MRI had been used.4–6 However, MRI is costly, time
have been successful in lowering breast cancer mortality consuming, and not realistic to use as a first-line
and adherence to regular screening is high.1,2 However, screening modality for all women. There is a need for a
there are two issues that have not been fully addressed: triaging model to identify the mammograms for which
the massive number of radiologist hours spent on radiologist assessment is unnecessary and to identify the
assessing mainly healthy women and the relatively large women at highest risk of leaving screening facilities with
proportion of women whose cancer is not detected undetected cancer.
during screening, despite regular screening participation. Several artificial intelligence (AI) cancer-detection soft­
Among 1000 women who attend one screening exam­ ware algorithms have been developed for mam­mography.7–9
ination in biennial screening programmes, approxi­ Some software algorithms are now at a perfor­mance level
mately five women will have their cancer detected, and comparable to radiologists in assessing mammograms,
two women will have a normal screening assessment even though validation in a truly representative screening
and be diagnosed clinically with breast cancer in the cohort is still lacking.10,11 There are several potential roles
interval before the next planned screening (interval for AI in the screening process, which have not yet been
cancer).3 To further add to the improvement potential in fully investigated. Using an AI cancer detector as a concur­
early detection, around 20% of screen-detected cancers rent assistant to a radiologist has the potential to find
are large (>2 cm), making it probable that many of these additional cancers, but will not meet the aim of red­ucing

www.thelancet.com/digital-health Vol 2 September 2020 e468


Articles

Research in context
Evidence before this study AI algorithm scoring of mammograms, the lowest 60% could
Several artificial intelligence (AI) cancer-detection software be triaged to a no radiologist work stream without missing
algorithms have been developed for mammography. There any cancer that would otherwise have been screen detected.
is evidence that some software algorithms are now at a Then, after negative radiologist assessment of the remaining
performance level comparable to radiologists in assessing mammograms, the highest AI scores could be used to identify
mammograms, even though validation in a true screening mammograms for an enhanced assessment work stream with
cohort is still absent. In addition to assisting the radiologist a substantial enrichment of false-negative assessments.
in tumour detection, there are several potential roles for
Implications of all the available evidence
AI in the screening process, which have not yet been
Commercial AI algorithms as independent readers of
investigated fully.
screening mammography assessment are now performing
Added value of this study on a clinically relevant level. AI-based scoring can be used
In this retrospective simulation study we show that a to reallocate radiologist time from clearly negative
commercial AI cancer-detector algorithm could be used mammograms towards cases where cancer might go
in triaging mammograms to decrease radiologist time spent undetected. AI has the potential to promote early detection
on clearly negative mammograms, and use these resources and thereby increase overall survival for breast cancer
for women at risk of having a false-negative screening. After patients.

radiologist resources.12,13 Therefore, our research focuses


Screened female cohort on AI as an independent reader.
In this retrospective simulation study, we examined
triaging based on two complementary roles for a
AI score No radiologist; 60% (lowest AI scores) commercially available AI cancer detector: as the first
rule out and only reader to dismiss the majority of normal
mammo­grams (no radiologist work stream), and as the
40%
3% abnormal final reader after a negative examination to identify
Double reading and mammogram
women at highest risk of undetected cancer (enhanced
consensus
assessment work stream). Using the same AI algorithm
37% normal to assign examinations to the no radiologist and to the
mammogram
Healthy enhanced assessment work streams ensures there is no
overlap—ie, that one examination does not go into
32–35%
AI score more than one work stream. The simulated work
rule in
process is described in figure 1. The AI cancer detector
Enhanced assessment; analysed each screening mammogram and generated a
2–5% (highest AI scores)
for supplemental imaging numerical score related to the likelihood of cancer signs
in the image (AI score). We hypothesised that a
Supplemental modality Normal MRI
(eg, MRI)
substantial proportion of the population with the lowest
AI scores could be safely ruled out without missing
Abnormal MRI
Diagnostic work-up cancers that would otherwise have been screen detected
by a radiologist. In addition, we hypothesised that
women with the highest AI scores after a negative
Breast cancer detected
examination would have a high proportion of cancer
occurring in the interval between screenings, which
Figure 1: Simulated triaging workflow
would have been potentially detectable by MRI or
The continuous prediction score between 0 and 1 from a commercial AI cancer
detector algorithm is used to triage women into a no radiologist work stream another modality.
when having a score below a rule-out threshold, and into an enhanced assessment
when having a score above a rule-in threshold (after negative double reading by Methods
radiologists). The majority of women could be triaged to the no radiologist work
stream without missing any cancer that would have been screen-detected by
Study population
radiologists. Among the women triaged to enhanced assessment (eg, by MRI) The sample group included screening-aged (40–74 years)
a large proportion of subsequent interval cancers and next-round screen-detected women (Cohort of Screen-Aged Women [CSAW]) within
cancers would be contained. This figure is based on the finding that the bottom the Stockholm county area, who had mammograms
60% did not contain any cancer that would otherwise have been screen detected
(table 2); therefore, we chose to channel 60 of 100 examinations to no radiologist
between 2008 and 2016. These criteria saw the inclusion
assessment. The rest of the percentages are expected consequences based on this of around 500 000 women, more than 1 million images,
initial choice. AI=artificial intelligence. and around 10 000 breast cancer cases.14 The current

e469 www.thelancet.com/digital-health Vol 2 September 2020


Articles

study sample was derived from the case-control subset Healthy Breast cancer detected
of CSAW and consisted of women from the Karolinska
University Hospital uptake area examined between Random sampling of 10 000 women 747 women diagnosed with breast cancer*
Feb 10, 2009, and Dec 10, 2015. Exclusions are described
in figure 2. The main exclusion criteria were that healthy 3183 excluded 200 excluded
women must have had at least 2 years follow-up, and 1249 without two consecutive 177 without two consecutive screening
must have participated in two consecutive screening examinations rounds
995 with <2 year follow-up 22 with >2·5 years between last image
rounds within 2·5 years to enable image analysis based 909 examined after Dec 31, 2015 and diagnosis
on the previous screening mammogram. The final 4 with unknown radiologists 1 excluded because mammogram
26 with implants was acquired after diagnosis
study sample consisted of 7364 women: 547 diagnosed
women and 6817 healthy controls. The Regional Cancer
Center Stockholm-Gotland reported that during the 6817 healthy women included in the final 547 women with breast cancer included in
study sample the final study sample
study time, the participation rate varied between 71%
and 75%. Two radiologists assessed each mammography
examination, and deter­mined whether it was normal or Figure 2: Study population flow chart
*All women who were diagnosed with breast cancer at Karolinska University Hospital between 2010 and 2015,
whether there was a suspicious finding. If there was a within the screening age range of 40–74 years, with complete screening examination, without previous breast
suspicious finding by any of the radiologists, the cancer and without implants.
examination went on to a consensus discussion, in
which it was decided whether the woman would be Statistical analysis
recalled or not. During the study time, the recall rate Since our study sample was enriched with positive cases,
varied between 2·0% and 2·6%. For this retrospective we applied an 11-times up-sampling of healthy women
study, the research was approved by the Swedish ethical to mimic the ratio in the source screening cohort (approxi­
review board, which waived the need for individual mately 0·7% incident cancer per screening interval).3 To
informed consent. examine the AI cancer detector in a rule-out role (no
radiologist work stream), we determined the number and
Images percentage of women diagnosed with screen-detected
All included women had a complete four-view, full-field breast cancer for each decile of AI score to understand how
digital mammography examination. All mammograms many would be missed in relation to the population
were acquired on Hologic equipment (Hologic, proportion included. For diagnosed women, we also
Marlborough, MA, USA). Percent mammographic dens­ defined the AI score separately for the ipsilateral and
ity was estimated by the publicly available LIBRA soft­­ contralateral breast, to examine association with the same
ware, version 1.0.4, from the University of Pennsylvania side as the cancer was verified. We tested for statistical
(Philadelphia, PA, USA).15 differences with the Student’s t test. The analysis then
focused entirely on women with negative screening exam­
Deep neural network inations. The analysis was based on complete-case analysis.
The AI cancer detector algorithm used for detection To examine the AI cancer detector in a rule-in role
of tumour signs was sourced from a commercial vendor (enhanced assessment work stream), the AI score of
(Lunit, Seoul, South Korea), version 5.5.0.16 The vendor women with a negative examination after radiological
has stated that the algorithm was originally trained on assessment was divided into percentiles starting with the
170 230 mammograms, from 36 468 women diagnosed highest score. We examined alternative definitions of what
with breast cancer and 133 762 healthy controls. The proportion of the highest scores would be included in the
mammograms used in the original training came from enhanced assessment work stream: top 1%, 2%, 5%, 10%,
five institutions: three from South Korea, one from 15%, or 20%. For each alternative, we determined the
the USA, and one from the UK.16 The mammograms number of subsequent interval cancers and next-round,
were acq­ uired on equipment from GE Healthcare, screen-detected cancers according to the recorded radio­
Hologic, and Siemens, and consisted of both screening logist assessments. The potential additional cancer detec­
and diagnostic mammograms. In the standard version, tion rate was calculated as the number of these subsequent
the AI cancer-detector software visually highlights areas cancers divided by the total number of women included in
in the mammograms where the suspicion of tumour the work stream. The potential additional cancer detection
is above a certain threshold. However, in this study we rate for women whose AI score was not in the selected top
did not use the images, but instead used the underlying proportion would be 0 because they would have no
prediction score of the algorithm. The generated additional examination. To explore alter­native prediction
prediction score for tumour presence was a decimal models for the enhanced assessment work stream, we
number between 0 and 1, where 1 represented the fitted logistic regression models for the traditional pre­
highest level of suspicion. The AI score on examination dictor mammographic density as well as for AI score. We
level was defined by the maximum of the image-level estimated odds ratios (ORs) with 95% CI for both breasts,
prediction scores. and calculated the area under the receiver-operating curve

www.thelancet.com/digital-health Vol 2 September 2020 e470


Articles

(AUC). We examined the net effect on cancer detection by detected at screening and 200 were detected clinically as
scenarios of combining various popu­lation proportions for interval cancers. The median age at mam­mography was
the no radiologist work stream (less resources but 53·6 years (IQR 15·4). The simulated screening population
potentially missing some cancers) and for the enhanced contained 75  534 women, resulting in 0·74% cancer
assessment work stream (more res­ources and potentially incidence over one screening interval. The median and
increasing cancer detection). dispersion of AI scores are reported in the appendix (p 3).
We account for a scenario where the AI cancer detector
Role of the funding source assessed all screening examinations as a single reader
The funding source, Stockholm County Council, pro­ without radiologists (ie, the no radiologist work stream)
vided funds for the entire project but had no influence in table 1. We determined that the AI score did not miss
over how any aspect of the work was carried out. The any cancer that would otherwise have been screen
funder of the study had no role in study design, data detected for mammograms with the 60% lowest AI
collection, data analysis, data interpretation, or writing of scores. For the 70%, 80%, and 90% lowest AI scores,
the report. The corresponding author had full access to there were one, nine, and 14 missed cancers, respectively,
all the data in the study and had final responsibility for which corresponded to 0·3%, 2·6%, and 4·0%, respect­
the decision to submit for publication. ively, of all screen-detected cancers in the population.
In the enhanced assessment work stream, we showed
Results that among the top 1% of AI scores for women with
See Online for appendix The study population is described in the appendix (p 2). negative mammograms after double reading, there were
7364 women were included: 547 diagnosed with breast 24 (12%) interval cancers and 48 (14%) next-round
cancer, and 6817 healthy controls. 347 cancers were screen-detected cancers (table 2). The results for the top
5% were 53 (27%) interval cancers and 121 (35%) next-
n Proportion (95% CI) round screen-detected cancers. Expressing the total
potential detection, both interval cancers and screen-
Lowest 10% 0 0 (NA)
detected cancers, as a detection rate corresponded to
Lowest 20% 0 0 (NA)
114 per 1000 women within the top 1% AI scores and
Lowest 30% 0 0 (NA)
34 per 1000 women within the top 5% AI scores. The raw
Lowest 40% 0 0 (NA)
AI scores for the cutoff points for the two novel work
Lowest 50% 0 0 (NA)
streams are presented in the appendix (p 4).
Lowest 60% 0 0 (NA)
Alternative prediction models for the enhanced assess­
Lowest 70% 1 0·3% (0·0–4·3) ment work stream are shown in table 3. The OR for
Lowest 80% 9 2·6% (1·1–5·4) predicting interval cancer was 2·01 (95% CI 1·98–2·18;
Lowest 90% 14 4·0% (2·1–6·9) AUC 0·74) and 1·59 (1·50–1·68; 0·67) for maximum
All 347 100·0% (NA) AI score and mammographic density, respectively.
AI computer-aided detection score shows the upper cut-point for the no The corresponding numbers for predicting next-round
radiologist work stream. n=74 987 healthy women and 547 cancer diagnoses. screen-detected cancer were 2·29 (2·22–2·38; 0·76) and
NA=not applicable.
1·12 (1·06–1·18; 0·65) for maximum AI score and
Table 1: Number of screen-detected cancers that would be missed in the mammographic density, respectively. The OR was
no radiologist work stream depending on the proportion of the markedly higher for the breast containing the cancer than
population lowest scores included
for the other breast when using the AI score, while the OR
was similar between breasts for mammographic density.
The potential net change in cancer detection when
Interval cancer Next-round screen- Cancer of both Additional cancer
(n=200) detected cancer categories detection rate* using AI score to save resources (no radiologist work
(n=347) (n=547) stream) and to increase potential cancer detection
Highest 1% (n=633) 24 (12%) 48 (14%) 72 (13%) 114/1000 (enhanced assessment work stream) is shown in table 4,
Highest 2% (n=1445) 32 (16%) 71 (21%) 103 (19%) 71/1000 based on results from tables 1 and 2. If no radiologist
Highest 5% (n=5073) 53 (27%) 121 (35%) 174 (32%) 34/1000 resources were used for 90% of women with the lowest
Highest 10% (n=8746) 73 (37%) 155 (45%) 228 (42%) 26/1000 AI scores and were invested into doing MRI for the top
Highest 15% (n=12 571) 86 (43%) 183 (53%) 269 (49%) 21/1000 2% AI scores (that were negative after radiologist double
Highest 20% (n=16 181) 100 (50%) 204 (59%) 304 (56%) 19/1000
reading of the mammograms), a net of 89 of 547 cancers
All (n=75 534) 200 (100%) 347 (100%) 547 (100%) 7/1000
would potentially have been detected up to 2 years earlier,
corresponding to a detection rate of 59 cancers per
Data are n (%) or n/n. *The ratio was calculated with the total number of women in the population selected as the
denominator.
1000 supplemental screening examinations.

Table 2: Potential detection of interval and next-round screen-detected cancer in the enhanced Discussion
assessment work stream depending on the proportion of the population highest scores (after negative
In this study we show that a commercial AI cancer-
double-reading) included
detector algorithm could be used as both a single reader to

e471 www.thelancet.com/digital-health Vol 2 September 2020


Articles

determine easily read, negative mammograms with


OR (95% CI) AUC p value*
no radiologist involvement, and to select women for
enhanced supplemental screening after a negative double Interval cancer†

reading by radiologists. Within the no radiologist work AI score


stream, among the women with the lowest 60% scores, Maximum of left and right 2·01 (1·98 to 2·18) 0·74 <0·001
the AI cancer-detector algorithm would not miss any Ipsilateral 1·95 (1·85 to 2·04) 0·79 <0·001
cancer that would otherwise have been screen detected. Contralateral 1·04 (0·91 to 1·20) 0·54 <0·001
Within the enhanced assessment work stream, after a Percent density, age adjusted
negative double reading among the women with the Maximum of left and right 1·59 (1·50 to 1·68) 0·67 <0·001
highest 2% AI scores, there would be a potential additional Ipsilateral 1·56 (1·45 to 1·69) 0·67 <0·001
cancer detection rate of 71 per 1000 exam­inations. An Contralateral 1·51 (1·40 to 1·63) 0·65 <0·001
important aspect of our work is that the evaluated AI Screen-detected cancer next round‡
algorithm has not been trained on any images from our AI score
institution at any timepoint. Maximum of left and right 2·29 (2·22 to 2·38) 0·76 <0·001
Using the AI cancer-detector algorithm to triage Ipsilateral 2·11 (2·03 to 2·19) 0·80 <0·001
women into the no radiologist assessment by the single- Contralateral 1·00 (0·90 to 1·13) 0·49 <0·001
reader rule out obviates the use of a radiologist in the Percent density, age adjusted
assessment of those mammograms, and at the 60% Maximum of left and right 1·12 (1·06 to 1·18) 0·65 <0·001
score level, our results show that this programme would Ipsilateral 1·09 (1·01 to 1·17) 0·65 <0·001
be successful. For the rule-out role, the performance of Contralateral 1·12 (1·05 to 1·21) 0·65 <0·001
the current algorithm is superior to the 19% cancer-free AI score is the prediction score from a commercial mammography AI cancer-
examinations reported in a previous study.17 Even when detection software. Logistic regression modelling was used to estimate per
we triaged 90% of the women with the lowest AI scores SD ORs. AUC was estimated per model (including age for age-adjusted percent
for no radiologist, only 4% of the screen-detected density). OR=odds ratio. AUC=area under the receiver-operating curve. AI=artificial
intelligence. *p value for AUC comparison between the AI score and the percent
cancers would have been missed. This percentage is density model. †n=6817 healthy women; 200 women diagnosed with interval
small compared with the percentage of interval cancers cancer. ‡n=6817 healthy women; 347 women with screen-detected cancer in the
to screen-detected cancers of around 40% in biennial next screening.

screening (28% of all cancers).3 Whether choosing a Table 3: AI score and mammographic percent density as alternative
60% or 90% population threshold, a massive reduction predictors for triaging into the enhanced assessment work stream
in radiologist workload would result, and AI as an detecting subsequent interval cancers and next-round screen-detected
independent rule-out reader has great potential. cancers

We also examined the potential number of additional


cancers detected using the AI cancer-detector algorithm
Top 1% AI score Top 2% AI score Top 5% AI score Top 10% AI score
to triage patients into the enhanced assessment work
stream by the final-reader rule in after a negative Bottom 50% AI score 72 (95/1000) 103 (68/1000) 174 (46/1000) 228 30/1000)

mammogram. For each threshold of assigning enhanced Bottom 60% AI score 72 (95/1000) 103 (68/1000) 174 (46/1000) 228 (30/1000)
assessment, we found the relative reduction of subsequent Bottom 70% AI score 71 (94/1000) 102 (68/1000) 173 (46/1000) 227 (30/1000)
interval cancers and of next-round screen-detected Bottom 80% AI score 63 (83/1000) 94 (62/1000) 165 (44/1000) 219 (29/1000)
cancers was of similar magnitude. If the examined Bottom 90% AI score 58 (77/1000) 89 (59/1000) 160 (42/1000) 214 (28/1000)
method is imple­ mented clinically, the reduction in Data are n (detection rate). n=75 534 women, of which 547 were diagnosed with breast cancer. Net number of
interval cancer would most likely be apparent by a additional cancers (detection rate per 1000 examinations) calculated by proportion of women in the no radiologist
work stream and proportion of women in the enhanced assessment work stream. AI=artificial intelligence.
continuously lower number of interval cancers. However,
earlier detection of next-round screen-detected cancers Table 4: Number of cancers detected earlier by the enhanced assessment work stream subtracted
would mainly be apparent by an increased number of by screen-detected cancers missed in the no radiologist work stream
screen-detected cancers at the first screening. In the
continuation, the number of screen-detected cancers
would not be reduced, but a stage shift towards smaller provider.20 Kerlikowske and colleagues21 discuss that it
cancer could be expected. We found that the accuracy in would not be acceptable to leave women with an interval
predicting future interval cancer and next-round screen- cancer rate above 1 per 1000 without further examination.
detected cancer was markedly higher for AI computer- They found that when combining density with a
aided detection score than for mammographic density. traditional breast cancer risk model, there were 35 interval
Mammo­graphic density has previously been established cancer cases among the 24 294 women identified by their
as a strong risk factor for interval cancer.18,19 In many US model in a simulated screening cohort of 100 000 women.
states, legislation requires that women with high However, their additional cancer detection rate of 1·4 per
mammographic density should be informed that they are 1000 examinations is much below the 6·2 per 1000 exam­
at risk of reduced mammographic sensitivity, and should inations that would result from preventing all interval
discuss supple­ mental screening with their health-care cancers for the 20% of women with the highest AI scores

www.thelancet.com/digital-health Vol 2 September 2020 e472


Articles

in our study. The US study and our study are different in design to improve computing efficiency. A limitation of
at least two aspects: in the USA, screening there is mostly our study was our requirement that all women must have
annual screening, not biennial as in our study, and the had a previous mammogram not more than 30 months
interval cancer rate is around 13% according to the US before diagnosis, which consequently affected the
Breast Cancer Surveillance Consortium.22 proportion of interval cancer (28% before and 37% after
The AI algorithm and mammographic density might these exclu­sions). A second limitation was that we were
capture complementary explanatory factors for interval not informed of the location of the radiological findings
cancer. We speculate that the AI algorithm finds subtle and could therefore not examine whether the AI
tumour markers that were previously unidentified, while algorithm finding was at the same location as where
mammographic density is most likely associated with the cancer was later found. Another limitation was that all the
risk of masking, or obstruction, of tumour signs. A useful women were from Sweden, and the results could differ in
feature of the AI cancer detector is that the software a population with a different ethnic and geo­graphical
produces an image with a marker for the localisation of composition. Addi­tionally, our programme was based on
the suspicious finding in the mammography image. biennial screening, and results from an annual screening
Therefore, one could consider routing women with a high programme could be different. Results in a clinical setting
AI score to targeted ultrasound examination, guided by could differ from our study, for example if radiologist
the locali­sation shown on the AI cancer-detector software. performance is affected by knowing that there is an AI
Based on previous studies using MRI-guided localisation algorithm potentially detecting missed cancer signs. A
for second-look ultrasound, there is reason to believe that final limi­tation was that the specific cutoff points for the
many cancers could be detected.23 In a clinical setting, AI algorithm were derived in our setting, using our
there are many considerations (eg, local availability, radio­ equipment and acquisition settings for the mammograms.
logical expertise, and economic considerations) to take In conclusion, our retrospective study shows that using
into account when deciding whether the supple­mental a commercial AI cancer detector to triage mammograms
method should include one or all of MRI, contrast- into no radiologist assessment and enhanced assessment,
enhanced mammog­ raphy, or AI cancer-detector-guided could potentially reduce radiologist workload by more
ultrasound. than half, and pre-emptively detect a substantial propor­
The resources saved by the no radiologist triage tion of cancers otherwise diagnosed later. Retrospective
consist of the radiologist assessments and discussions trials in other settings and a prospective trial would be
of screening mammograms, while the resources needed to validate our findings.
expended for the enhanced assessment triaging include Contributors
doing the ultra­sound or MRI examination and following All authors have contributed to different parts of the Article. KD searched
assess­ments, discussions, and further work-up. The the literature, collected data, did analysis, interpreted data, and wrote the
Article. EW, YL, and MS collected data. KS interpreted data. PL interpreted
number of averted radiologist assessments required to data and collected data. ME designed the study, interpreted data, and wrote
finance one MRI examination varies by country and the Article. FS searched the literature, collected data, did analysis, prepared
setting. However, even if it were necessary to save 90% figures, interpreted data, and designed the study. In addition, all authors
of the mammography assessments to finance MRI were involved in drafting the work or revising it critically for important
intellectual content, or in the final approval of the version submitted for
examinations for 1% of the population, the net change publication.
in cancer detection is still positive by a wide margin.
Declaration of interests
On a population level, the suggested balanced strategy FS declares receiving consulting fees from Collective Minds Radiology,
would most probably result in a marked increase in unrelated to this Article. All other authors declare no competing
cancers detected early. In a previous study24 we showed interests.
that the women generally seemed to have positive Data sharing
attitudes towards using a com­puter program to assess All data collected for the study cannot be made publicly available due
to Swedish and European regulations, and permission from the original
mam­ mograms and to triage for MRI screening.
information owner for the use of data in research. However, contact the
However, a few individual women might end up with last author (FS; firstname.lastname@ki.se) for academic inquiries into
clinically detected cancer that, in hindsight, could have the possibility of applying for access to de-identified data through a Data
been detected by a radiologist on the previous screening Transfer Agreement procedure, which will then require permission from
the head of the department at Karolinska Institute. The examined AI
mammogram. This could be a starting point for an
algorithm is a commercially available third-party product that we have
important conversation between policy makers and no rights to share.
screening participants.
Acknowledgments
An important strength of this study is that the AI The funding source (Stockholm City Council, grant number 20170802)
algorithm is commercially available, and has never provided funds for the entire project, but had neither an influence over
previously been exposed to images from our department how any aspect of the work was carried out, nor any impact on the
transparency of the Article. We were allowed to use the AI algorithm
or our equipment. Additionally, the cohort of women
free of charge by Lunit, South Korea; the company had no influence
come from a population-based screening population. A over the research question, nor any other aspect of the work carried
weak­ness of our study is that we did not study the full out, nor any impact on the transparency of the Article.
screening cohort, but instead used a case-controlled

e473 www.thelancet.com/digital-health Vol 2 September 2020


Articles

References 14 Dembrower K, Lindholm P, Strand F. A multi-million mammography


1 Shapiro S, Venet W, Strax P, Venet L, Roeser R. Ten- to image dataset and population-based screening cohort for the training
fourteen-year effect of screening on breast cancer mortality. and evaluation of deep neural networks—the cohort of screen-aged
J Natl Cancer Inst 1982; 69: 349–55. women (CSAW). J Digit Imaging 2019; 33: 408–13.
2 Tabár L, Vitak B, Chen TH-H, et al. Swedish two-county trial: 15 Keller BM, Chen J, Daye D, Conant EF, Kontos D. Preliminary
impact of mammographic screening on breast cancer mortality evaluation of the publicly available Laboratory for Breast
during 3 decades. Radiology 2011; 260: 658–63. Radiodensity Assessment (LIBRA) software tool: comparison of fully
3 Törnberg S, Kemetli L, Ascunce N, et al. A pooled analysis automated area and volumetric density measures in a case-control
of interval cancer rates in six European countries. Eur J Cancer Prev study with digital mammography. Breast Cancer Res 2015; 17: 117.
2010; 19: 87–93. 16 Kim H-E, Kim HH, Han B-K, et al. Changes in cancer detection
4 Strand F, Humphreys K, Holm J, et al. Long-term prognostic and false-positive recall in mammography using artificial
implications of risk factors associated with tumor size: a case study intelligence: a retrospective, multireader study. Lancet Digital Health
of women regularly attending screening. Breast Cancer Res 2018; 2020; 2: e138–48.
20: 31. 17 Yala A, Schuster T, Miles R, Barzilay R, Lehman C. A deep learning
5 Warner E, Messersmith H, Causer P, Eisen A, Shumak R, model to triage screening mammograms: a simulation study.
Plewes D. Systematic review: using magnetic resonance imaging Radiology 2019; 293: 38–46.
to screen women at high risk for breast cancer. Ann Intern Med 18 Boyd NF, Lockwood GA, Byng JW, Tritchler DL, Yaffe MJ.
2008; 148: 671–79. Mammographic densities and breast cancer risk.
6 Morrow M, Waters J, Morris E. MRI for breast cancer screening, Cancer Epidemiol Biomarkers Prev 1998; 7: 1133–44.
diagnosis, and treatment. Lancet 2011; 378: 1804–11. 19 Pollán M, Ascunce N, Ederra M, et al. Mammographic density and
7 Chan H-P, Samala RK, Hadjiiski LM. CAD and AI for breast risk of breast cancer according to tumor characteristics and mode
cancer-recent development and challenges. Br J Radiol 2020; of detection: a Spanish population-based case-control study.
93: 20190580. Breast Cancer Res 2013; 15: R9.
8 Schaffter T, Buist DS, Lee CI, et al. Evaluation of combined artificial 20 Freer PE. Mammographic breast density: impact on breast cancer
intelligence and radiologist assessment to interpret screening risk and implications for screening. Radiographics 2015; 35: 302–15.
mammograms. JAMA Netw Open 2020; 3: e200265. 21 Kerlikowske K, Zhu W, Tosteson AN, et al. Identifying women with
9 McKinney SM, Sieniek M, Godbole V, et al. International evaluation dense breasts at high risk for interval cancer: a cohort study.
of an AI system for breast cancer screening. Nature 2020; 577: 89–94. Ann Intern Med 2015; 162: 673–81.
10 Rodriguez-Ruiz A, Lång K, Gubern-Merida A, et al. Stand-alone 22 Lehman CD, Arao RF, Sprague BL, et al. National performance
artificial intelligence for breast cancer detection in mammography: benchmarks for modern screening digital mammography: update
comparison with 101 radiologists. J Natl Cancer Inst 2019; from the Breast Cancer Surveillance Consortium. Radiology 2017;
111: 916–22. 283: 49–58.
11 Rodríguez-Ruiz A, Krupinski E, Mordang J-J, et al. Detection 23 Ko EY, Han B-K, Shin JH, Kang SS. Breast MRI for evaluating
of breast cancer with mammography: effect of an artificial patients with metastatic axillary lymph node and initially negative
intelligence support system. Radiology 2019; 290: 305–14. mammography and sonography. Korean J Radiol 2007; 8: 382–89.
12 Mayo RC, Kent D, Sen LC, Kapoor M, Leung JWT, Watanabe AT. 24 Jonmarker O, Strand F, Brandberg Y, Lindholm P. The future
Reduction of false-positive markings on mammograms: of breast cancer screening: what do participants in a breast
a retrospective comparison study using an artificial cancer screening program think about automation using artificial
intelligence-based CAD. J Digit Imaging 2019; 32: 618–24. intelligence? Acta Radiol Open 2019; 8: 1–7.
13 Conant EF, Toledano AY, Periaswamy S, et al. Improving accuracy
and efficiency with concurrent use of artificial intelligence for
digital breast tomosynthesis. Radiol Artif Intell 2019; 1: e180096.

www.thelancet.com/digital-health Vol 2 September 2020 e474

You might also like