Professional Documents
Culture Documents
Raya-Povedano 2021
Raya-Povedano 2021
Background: The workflow of breast cancer screening programs could be improved given the high workload and the high number of
false-positive and false-negative assessments.
Purpose: To evaluate if using an artificial intelligence (AI) system could reduce workload without reducing cancer detection in
breast cancer screening with digital mammography (DM) or digital breast tomosynthesis (DBT).
Materials and Methods: Consecutive screening-paired and independently read DM and DBT images acquired from January 2015 to
December 2016 were retrospectively collected from the Córdoba Tomosynthesis Screening Trial. The original reading settings were
single or double reading of DM or DBT images. An AI system computed a cancer risk score for DM and DBT examinations inde-
pendently. Each original setting was compared with a simulated autonomous AI triaging strategy (the least suspicious examinations
for AI are not human-read; the rest are read in the same setting as the original, and examinations not recalled by radiologists but
graded as very suspicious by AI are recalled) in terms of workload, sensitivity, and recall rate. The McNemar test with Bonferroni
correction was used for statistical analysis.
Results: A total of 15 987 DM and DBT examinations (which included 98 screening-detected and 15 interval cancers) from 15 986
women (mean age 6 standard deviation, 58 years 6 6) were evaluated. In comparison with double reading of DBT images (568
hours needed, 92 of 113 cancers detected, 706 recalls in 15 987 examinations), AI with DBT would result in 72.5% less workload
(P , .001, 156 hours needed), noninferior sensitivity (95 of 113 cancers detected, P = .38), and 16.7% lower recall rate (P , .001,
588 recalls in 15 987 examinations). Similar results were obtained for AI with DM. In comparison with the original double reading
of DM images (222 hours needed, 76 of 113 cancers detected, 807 recalls in 15 987 examinations), AI with DBT would result in
29.7% less workload (P , .001), 25.0% higher sensitivity (P , .001), and 27.1% lower recall rate (P , .001).
Conclusion: Digital mammography and digital breast tomosynthesis screening strategies based on artificial intelligence systems could
reduce workload up to 70%.
Published under a CC BY 4.0 license.
Figure 1: Diagram illustrates how digital mammography (DM) and digital breast tomosynthesis (DBT) images (with synthetic mammography [SM]) were read during the
trial, in four independent readings by four radiologists. During the trial, a woman was recalled when any of the four arms recalled the examination (no arbitration or consen-
sus). The assessments of each reading arm were recorded.
Figure 2: Flowchart of the original screening strategies and how they were compared with the artificial intelligence (AI)–based screening strategy. If the original setting
used digital mammography (DM), AI scores computed on DM images were used. Similarly, if the original setting used digital breast tomosynthesis (DBT), the AI scores com-
puted on only DBT images were used. Cases were considered very likely normal if the AI score was 7 or lower (approximately 70% of screening volume). Additionally, the
examinations not recalled by radiologists but with an AI score among the 2% most suspicious examinations in the cohort were considered automatically recalled.
Because of this interpretation setting, it was possible to com- In this AI strategy, the least suspicious examinations for AI
pute the performance of the following settings, hereinafter re- (those assumed very likely normal with an AI score of 7 or
ferred to as original screening settings: double reading of DM im- lower, approximately 70% according to the device specifica-
ages (if either reader recalls, the case is recalled), double reading tions; the cutoff was chosen based on previous research [20]
of DBT images (if either reader recalls, the case is recalled), and indicating that replacing double reading with single reading
single reading of DBT images (with synthetic mammograms). for these very likely normal cases would not reduce screening
sensitivity by more than 5%) would not be human-read, and
AI System the rest of examinations would be read as in the original setting
The AI system used in this study (Transpara, version 1.6.0; Screen- (single or double reading of DM or DBT images). Addition-
Point Medical) was previously investigated in other publications ally, the examinations not recalled by radiologists but within
(17,19,20,22–24). This system uses deep learning to detect lesions the 2% most suspicious examinations as graded by AI would
suspicious for breast cancer on DM and DBT images. The most be automatically recalled in order to potentially improve sen-
suspicious findings detected by the system are marked on every sitivity (the cutoff was chosen taking into account radiologists’
image and assigned a score between 1 and 100. Based on the maxi- recall rate at this site).
mum suspicious finding present in the examination, a proprietary The output of the AI triaging was analyzed by a panel of ra-
conversion table generates an examination score from 1 to 10, in- diologists (J.L.R.P., S.R.M., E.E.C., and M.Á.B., with 20, 8, 3,
dicating the increasing likelihood that a visible cancer is present and 20 years of experience, respectively), and findings were con-
on the mammogram. The DBT images and the DM images of sidered true-positive only if the system correctly localized them
each examination were independently processed by the AI system, and assigned them the highest suspicion score at the examina-
resulting in two AI scores per examination: an AI-DM score and tion (on the region suspicion scale of 1–100).
an AI-DBT score.
Statistical Analysis
AI-based Screening Strategy First, the distribution of AI examination scores in DM and
For each original setting, an autonomous AI triaging screening DBT was computed for different groups of examinations based
strategy was retrospectively simulated, aiming to reduce work- on ground truth (95% CIs were computed using the Wilson
load while maintaining sensitivity (detailed in Fig 2). binomial method).
No. of Women
Characteristic (n = 15 986)*
No. of examinations 15 987
Age at screening
50–54 years 6173 (38.6)
55–59 years 3800 (23.8)
60–64 years 3335 (20.9)
64–69 years 2678 (16.8)
Mean age at screening (y)† 58 6 6
Breast density‡
A 3648 (22.8)
B 8153 (51.0)
C 3749 (23.5)
D 436 (2.7)
Original outcomes
No. of normal readings (with 2-year follow-up) 14 795 (92.5)
No. of false-positive recalls 1078 (6.7)
(at either DBT or DM)
Figure 3: Flowchart of data selection. In total, 15 987 examinations from 15 986 No. of screening-detected cancers 98 (0.6)
women were included. PACS = picture archiving and communications system.
(at either DBT or DM)
No. of interval cancers 15 (0.1)
The screening reading workload, sensitivity (including Note.—Unless otherwise specified, data are numbers of women
screening-detected and interval cancers), and recall rate (ie, the (n = 15 986), with percentages in parentheses. DBT = digital
number of examinations recalled by either the AI system or ra- breast tomosynthesis, DM = digital mammography.
diologists divided by total examinations) were compared between * One woman had a bilateral breast cancer and had two different
each original screening setting and the AI-based screening strategy examinations included in the original trial, for a total of 15 987
by using the McNemar test for paired data, with an a of .05 in- examinations.
dicating statistical significance. Additionally, the AI-based strategy
†
Data are means 6 standard deviations.
in DBT was compared with the original double reading of DM.
‡
Breast density was graded according to the American College of
Radiology Breast Imaging Reporting and Data System lexicon.
Screening workload was defined as the number of readings,
and an estimate in hours was computed using the average read-
ing time per examination originally reported in this cohort (21):
25 seconds for a DM examination and 64 seconds for a DBT Table 2: Summary of Cancer Characteristics
plus DM or synthetic mammography examination.
To control for multiple comparisons (four in total; see Fig Interval
2), Bonferroni correction was applied. P = .013 (ie, .05/4) was Screening-detected Cancers
Characteristic Cancers (n = 98) (n = 15)
considered to indicate a significant difference after Bonferroni
correction. To control for multiple comparisons of the end point Morphologic type
metrics (workload, sensitivity, and recall rate), these were tested Mass 54 (55) 13 (87)
sequentially for each comparison. Architectural distortion 20 (20) 1 (6.7)
Asymmetry 3 (3.1) 1 (6.7)
The hypothesis was that in the AI-based strategy, workload
Calcifications 21 (21) 0 (0)
could be significantly reduced, with noninferior sensitivity and
Histologic type
recall rate (prespecified noninferiority margin difference of 5%,
Invasive ductal carcinoma 68 (69) 12 (80)
in relative terms). Noninferiority was concluded if the sensitivity
Invasive lobular carcinoma 4 (4.1) 1 (6.7)
or the recall rate was superior (higher sensitivity, lower recall rate) Other invasive 0 (0) 1 (6.7)
in the AI-based setting, and the lower limit of the 95% CI of the Ductal carcinoma in situ 26 (27) 1 (6.7)
difference was greater than the negative value of the prespecified Grade
noninferiority margin. If noninferiority was concluded, superior I 45 (46) 4 (27)
sensitivity and recall rate in the AI-based strategies were sequen- II 34 (34) 6 (40)
tially tested using the McNemar test. III 19 (20) 5 (33)
Size (mm)* 20.7 6 14.4 26.6 6 17.1
Results Note.—Unless otherwise specified, data are numbers of cancers,
Participant and Examination Characteristics with percentages in parentheses.
From the 16 067 women in the cohort, 15 987 examinations in * Data are means 6 standard deviations.
15 986 women (mean age 6 standard deviation, 58 years 6 6)
Figure 4: Bar graphs show distribution of artificial intelligence (AI) examination scores across different groups of examinations in the paired digital mammography (DM)–
digital breast tomosynthesis (DBT) cohort (all examinations, noncancer recalled examinations, screening-detected cancers, and interval cancers). AI scores were computed
for DM and DBT examinations independently. The ground truth was computed for DM-based screening outcomes and for DBT-based screening outcomes (which includes
DBT plus DM and DBT plus synthetic mammography workflows). For interval cancers, the only difference is whether the AI scores were computed for DM or DBT images.
Table 3: Comparison of the Original Settings and the AI-based Strategy in Terms of Workload, Sensitivity (Cancers Detected),
and Recall Rate
Metric Original Setting without AI With Simulated Autonomous AI Triaging Relative Difference* P Value
DM: double human reading
Workload† 31 974 (222) 9100 (63) 271.5 (272.4, 270.6) ,.001‡
Sensitivity§ 67.3 (76/113) [58.2, 75.2] 69.0 (78/113) [60.0, 76.8] 2.63 (24.9, 11.4)|| .68
Recall rate§ 5.1 (807/15 987) [4.7, 5.4] 4.2 (671/15 987) [3.9, 4.5] 216.9 (224.0, 211.0) ,.001‡
DBT: double human reading
Workload† 31 974 (568) 8830 (156) 272.4 (272.9, 271.9) ,.001‡
Sensitivity§ 81.4 (92/113) [73.3, 87.5] 84.1 (95/113) [76.2, 89.7] 3.26 (22.2, 9.4)|| .38
Recall rate§ 4.4 (706/15 987) [4.1, 4.8] 3.7 (588/15 987) [3.40, 4.0] 216.7 (223.4, 28.6) ,.001‡
DBT: single human reading
Workload† 15 987 (284) 4415 (78) 272.4 (273.3, 271.5) ,.001‡
Sensitivity§ 77.0 (87/113) [68.4, 83.8] 79.6 (90/113) [71.3, 86.0] 3.45 (21.2, 9.8)|| .38
Recall rate§ 3.0 (481/15 987) [2.8, 3.3] 3.1 (499/15 987) [2.9, 3.4] 3.74 (23.7, 12.8) .41
Note.—AI = artificial intelligence, DBT = digital breast tomosynthesis, DM = digital mammography.
* Data are percentages, with 95% CIs in parentheses.
†
Unless otherwise specified, data are number of reads, with number of hours in parentheses.
‡
Significant difference.
§
Unless otherwise specified, data are percentages, with raw data in parentheses and 95% CIs in brackets.
||
Noninferior.
Figure 5: A, Digital mammography (DM) and, B, digital breast tomosynthesis (DBT) images in a 66-year-old woman not recalled during
any of the original readings. Artificial intelligence (AI) identified a spiculated mass (outlined) on images obtained with both techniques during
screening and assigned a region score of 82 at DM and 95 at DBT. This woman would have automatically been recalled only at DBT. C,
DM image obtained 4 months later, after she discovered a palpable lump (not related to the actual cancer). Biopsy was performed, and
an interval cancer, a grade II invasive ductal carcinoma of 6 mm, was diagnosed in the lesion that would have been recalled by AI. The AI
examination score of this case was 10.
were included (99.5%) (Fig 3). Eighty-one examinations (five of 31 974 reads; 95% CI: 71.9, 72.9) was observed with the AI-
noncancer recalled examinations and 76 normal examinations) based screening strategy.
from 81 women were excluded because of problems retrieving The AI-based strategy resulted in noninferior sensitivity
the data from the picture archiving and communication system. across different screening settings: 76 of 113 cancers detected in
In total, 113 examinations were labeled as showing cancers (98 double reading of DM images versus 78 of 113 when using AI
screening-detected and 15 interval). The characteristics of the (relative difference, 2.63%; 95% CI: 24.9, 11.4; P = .68); 92 of
selected cohort are included in Tables 1 and 2. 113 cancers detected in double reading of DBT images versus
95 of 113 when using AI (relative difference, 3.26%; 95% CI:
Distribution of AI Scores 22.2, 9.4; P = .38); and 87 of 113 cancers detected in single
The distribution of AI scores across the different groups of exami- reading of DBT images versus 90 of 113 when using AI (relative
nations in the cohort is shown in Figure 4, computed for both difference, 3.45%; 95% CI: 21.2, 9.8; P = .38).
DM and DBT examinations independently. The distribution of When compared with double readings, the AI-based strat-
AI scores is homogeneous for all screening examinations (approxi- egy was associated with an overall reduction in recall rate
mately 10% in each score category), whereas only a minority of of 16.9% (671 of 15 987 women recalled with AI vs 807
screening-detected cancers were scored 1–7: two of 76 DM-based of 15 987 without AI; 95% CI: 11.0, 24.0; P , .001) and
screening-detected cancers (2.6%; 95% CI: 0.72, 9.10) and one of 16.7% (588 of 15 987 women recalled with AI vs 706 of
92 DBT-based screening-detected cancers (1.1%; 95% CI: 0.19, 15 987 without AI; 95% CI: 8.6, 23.4; P , .001) in DM and
5.90). At the same time, AI examinations scored 1–7 comprise DBT double readings settings, respectively. When compared
11 437 of 15 987 of the DM-based screening volume (71.5%; with single reading of DBT images, recall rate showed a non-
95% CI: 70.8, 72.2) and 11 572 of 15 987 of the DBT-based significant increment (499 of 15 987 women recalled with AI
screening volume (72.4%; 95% CI: 71.7, 73.1). vs 481 of 15 987 without AI; relative difference, 3.74%; 95%
Given that this group of cases with scores 1–7 includes less CI: 23.77, 12.83; P = .41).
than 5% of screening-detected cancers, it was estimated that this
is an optimal cutoff point to differentiate likely normal exami- Examinations Recalled Only by AI
nations in the proposed AI-based strategies (negative predictive In double reading of DM images, AI additionally recalled a
value, 99.98% [95% CI: 99.94, 99.99] in DM and 99.99% [95% total of 210 examinations (four of which were true-positive
CI: 99.95, 99.99] in DBT), similar to previous studies (20). results). In double reading of DBT images, AI additionally
recalled a total of 206 examinations (four of which were true-
Simulated AI-based Strategy positive results). In single reading of DBT images, AI addition-
The comparison of the original screening strategy with the AI- ally recalled a total of 218 examinations (four of which were
based strategy is presented in Table 3. Consistently across DM- true-positive results). Therefore, in the group of additional
based and DBT-based screening, a workload reduction of 71.5% cases recalled by AI only, the positive predictive value ranged
(9100 of 31 974 reads; 95% CI: 70.6, 72.4) and 72.4% (8830 from 1.8% to 1.9%.
Figure 6: Images obtained in mediolateral oblique (top) and craniocaudal (bottom) views. A, Digital mammography (DM) images and C, digital breast tomosynthesis (DBT)
images in a 65-year-old woman recalled only because of the original DBT readings. B, AI-processed DM images show a focal asymmetry (outlined in red in the mediolateral
oblique view by AI, with a region score of 94; nonrecalled at DM readings). The yellow diamond outlines a cluster of calcifications, with a region score of 42, not related to the
actual cancer. D, AI-processed DBT images show spiculated mass (also outlined by AI in both views, with a region score of 95). AI would have correctly recalled that cancer
lesion in both techniques (B and D). Grade I invasive ductal carcinoma of 18 mm was diagnosed at percutaneous biopsy. The AI examination score of this case was 10.
The four cancers added by AI at DM examinations were Comparison of Unaided Double Reading of DM Images with
all originally screening-detected with DBT only (two of the AI-based Double Reading of DBT Images
four were ductal carcinoma in situ, one was a low-grade When comparing the AI-based strategy of DBT to the origi-
invasive ductal cancer, and one was a high-grade invasive nal double reading of DM (Table 4), it was observed that AI-
ductal cancer). based DBT screening would have been carried out with a smaller
Among the four cancers added by AI at DBT examinations, workload (156 hours vs 222 hours, a relative workload reduc-
two were originally screening-detected at DM and two were tion of 29.7% [95% CI: 23.8, 36.2], P , .001). The sensitivity
interval cancers (in total, three of the four were ductal carci- would have been 25.0% higher in relative terms (95% CI: 15.8,
noma in situ and one was a high-grade invasive ductal cancer). 36.3; P , .001), with 95 of 113 cancers detected with AI-DBT
Thirteen of 15 interval cancers were not detected with any AI- screening (84.1%; 95% CI: 76.2, 89.7) and 76 of 113 with un-
based strategy (not present in the top 2% of suspicion among AI aided DM screening (67.3%; 95% CI: 58.2, 75.2). Moreover,
scores), although nine of these interval cancers are included in the recall rate would have been 27.1% lower in relative terms
the group of examinations with AI scores of 8–10 (the top 30% (95% CI: 24.1, 30.3; P , .001), with 588 of 15 987 women
of suspicion among AI scores). recalled with AI-DBT screening (3.7%; 95% CI: 3.4, 4.0) and
Illustrative examples of examinations in the study where AI 807 of 15 987 women recalled with unaided DM screening
showed additional value are shown in Figures 5 and 6. (5.1%; 95% CI: 4.7, 5.4).
Table 4: Comparison of the AI Strategy for DBT with the Original Double Reading of DM Images
Discussion Our study has limitations. It was only performed with data
Current breast cancer screening programs have a high workload from a single site and single mammography and AI vendor.
for radiologists and an objectionable number of false-positive Moreover, because it was a retrospective study and the AI sce-
and false-negative assessments. Our findings highlight how an narios were simulated, it is not possible to know the impact on
artificial intelligence (AI) system could reduce up to 70% of the radiologists’ performance in the setting where they would, for
workload in digital mammography (DM)– and digital breast example, read only the 30% most suspicious screening examina-
tomosynthesis (DBT)–based breast cancer screening without re- tions. In addition, in the analysis of the AI system, readers were
ducing the sensitivity by 5% or more, indicating that workload blinded to prior examinations, as opposed to radiologist screen-
can be reduced while maintaining the overall program sensitiv- ing assessments, which requires further analyses to understand
ity. This was achieved when this proportion of least suspicious the clinical impact of using AI in screening when AI does not
examinations for AI would not be read by radiologists, while, at include prior information. Finally, although our results suggest
the same time, AI could be used as an additional complementary that no human reading of low-suspicion examinations would be
reader to recall cases not recalled by radiologists. Letting radiolo- the most optimal for the screening program cost-efficiency, fur-
gists read this group of the 70% least suspicious examinations ther legal discussions would be needed to establish a framework
led to more recalls. Moreover, AI could be used to transition where this strategy is safe for all the parties involved in screening.
from DM screening to DBT screening with a 30% reduction in In conclusion, our study shows a strategy with an artificial in-
workload (P , .001), a 25% improvement in sensitivity (P , telligence (AI) system where screening workload could be safely
.001), and a 27% reduction in recall rate (P , .001). reduced up to 70% for both digital mammography (DM)– and
Although several studies investigated how AI could reduce digital breast tomosynthesis (DBT)–based programs, as well as al-
workload in screening programs with DM (18–20,25), to our low the transition from DM- to DBT-based screening without an
knowledge, this is the first study to investigate AI-based strate- increase in workload. Given the increasing lack of expert breast
gies to reduce workload in DBT using real screening cohorts. radiologists as well as the increased workload associated with the
Furthermore, because our study uses paired DM and DBT ex- introduction of DBT, new strategies potentially using AI could be
aminations, it was possible to determine AI-based strategies for necessary to maintain the cost-efficiency of screening programs.
DBT that could replace standard DM screening without increas- Further prospective studies are needed to validate our findings.
ing workload, one of the biggest limitations of introducing DBT Acknowledgments: The authors thank the Department of Informatics at Hospital
into clinical practice. To our knowledge, our results have not Universitario Reina Sofía for their help in retrieving images from picture archiving
been reported in any other comparison between DM- and DBT- and communication system and their support in processing them.
based screening where transitioning to DBT is always associated Author contributions: Guarantors of integrity of entire study, J.L.R.P., M.Á.B.;
with an increase in workload (11,26). study concepts/study design or data acquisition or data analysis/interpretation, all
Previous studies in DM have suggested that it could be safe authors; manuscript drafting or manuscript revision for important intellectual con-
tent, all authors; approval of final version of submitted manuscript, all authors;
(ie, no sensitivity reduction) to use AI to reduce screening work- agrees to ensure any questions related to the work are appropriately resolved, all au-
load between 20% and 50% (18–20). In our study, we found thors; literature research, J.L.R.P., E.E.C., A.R.R., M.Á.B.; clinical studies, J.L.R.P.,
this to be 70%. This threshold of 70% to define the optimal E.E.C., A.R.R., M.Á.B.; experimental studies, A.G.M.; statistical analysis, J.L.R.P.;
and manuscript editing, J.L.R.P., S.R.M., A.G.M., A.R.R., M.Á.B.
group of least suspicious examinations was proposed by Balta
et al (20) using the same AI system in a DM screening cohort Disclosures of Conflicts of Interest: J.L.R.P. disclosed no relevant relationships.
and could also be reproduced in our study (including DBT). In J.L.R.P. disclosed no relevant relationships. S.R.M. disclosed no relevant relation-
ships. E.E.C. disclosed no relevant relationships. A.G.M. Activities related to the
comparison, earlier studies using previous versions of the same present article: disclosed no relevant relationships. Activities not related to the pres-
AI system found that the group of the 20% least suspicious ex- ent article: is an employee of ScreenPoint Medical. Other relationships: disclosed
aminations would be the most optimal threshold (19), also sug- no relevant relationships. A.R.R. Activities related to the present article: disclosed
no relevant relationships. Activities not related to the present article: is an employee
gesting how the continuous development of AI systems could of ScreenPoint Medical. Other relationships: disclosed no relevant relationships.
keep bringing this threshold further up in the future. M.Á.B. disclosed no relevant relationships.
References 15. Kim HE, Kim HH, Han BK, et al. Changes in cancer detection and false-
positive recall in mammography using artificial intelligence: a retrospec-
1. Hakama M, Coleman MP, Alexe DM, Auvinen A. Cancer screening: evi- tive, multireader study. Lancet Digit Health 2020;2(3):e138–e148.
dence and practice in Europe 2008. Eur J Cancer 2008;44(10):1404–1413. 16. McKinney SM, Sieniek M, Godbole V, et al. International evaluation of
2. Smith RA, Cokkinides V, Brooks D, Saslow D, Brawley OW. Cancer an AI system for breast cancer screening. Nature 2020;577(7788):89–94.
screening in the United States, 2010: a review of current American Can- 17. Rodríguez-Ruiz A, Lång K, Gubern-Mérida A, et al. Stand-Alone Arti-
cer Society guidelines and issues in cancer screening. CA Cancer J Clin ficial Intelligence for Breast Cancer Detection in Mammography: Com-
2010;60(2):99–119. parison With 101 Radiologists. J Natl Cancer Inst 2019;111(9):916–922.
3. Boyd NF, Guo H, Martin LJ, et al. Mammographic density and the risk 18. Yala A, Schuster T, Miles R, Barzilay R, Lehman C. A Deep Learning
and detection of breast cancer. N Engl J Med 2007;356(3):227–236. Model to Triage Screening Mammograms: A Simulation Study. Radiology
4. Mellado Rodríguez M, Osa Labrador AM. Breast cancer screening: cur- 2019;293(1):38–46.
rent status [in Spanish]. Radiología 2013;55(4):305–314. 19. Rodríguez-Ruiz A, Lång K, Gubern-Mérida A, et al. Can we reduce the
5. Castells X, Molins E, Macià F. Cumulative false positive recall rate workload of mammographic screening by automatic identification of
and association with participant related factors in a population based normal exams with artificial intelligence? A feasibility study. Eur Radiol
breast cancer screening p rogramme. J Epidemiol Community Health 2019;29(9):4825–4832.
2006;60(4):316–321. 20. Balta C, Rodríguez-Ruiz A, Mieskes C, Karssemeijer N, Heywang-Köb-
6. GLOBOCAN. Cancer Today. International Agency for Research on Can- runner SH. Going from double to single reading for screening exams la-
cer. World Health Organization, 2018. http://gco.iarc.fr/today. Accessed beled as likely normal by AI: what is the impact? In: Bosmans H, Marshall
August 21, 2020. N, Van Ongeval C, eds. Proceedings of SPIE: 15th International Work-
7. Skaane P, Bandos AI, Gullien R, et al. Prospective trial comparing full-field shop on Breast Imaging (IWBI2020). Vol 11513. Bellingham, Wash: In-
digital mammography (FFDM) versus combined FFDM and tomosynthe- ternational Society for Optics and Photonics, 2020; 115130D.
sis in a population-based screening programme using independent double 21. Romero Martín S, Raya Povedano JL, Cara García M, Santos Romero
reading with arbitration. Eur Radiol 2013;23(8):2061–2071. AL, Pedrosa Garriguet M, Álvarez Benito M. Prospective study aiming
8. Ciatto S, Houssami N, Bernardi D, et al. Integration of 3D digital mam- to compare 2D mammography and tomosynthesis + synthesized mam-
mography with tomosynthesis for population breast-cancer screening mography in terms of cancer detection and recall. From double reading
(STORM): a prospective comparison study. Lancet Oncol 2013;14(7): of 2D mammography to single reading of tomosynthesis. Eur Radiol
583–589. 2018;28(6):2484–2491.
9. Lång K, Andersson I, Rosso A, Tingberg A, Timberg P, Zackrisson S. Per- 22. Rodríguez-Ruiz A, Krupinski E, Mordang JJ, et al. Detection of Breast
formance of one-view breast tomosynthesis as a stand-alone breast cancer Cancer with Mammography: Effect of an Artificial Intelligence Support
screening modality: results from the Malmö Breast Tomosynthesis Screen- System. Radiology 2019;290(2):305–314.
ing Trial, a population-based study. Eur Radiol 2016;26(1):184–190. 23. Sasaki M, Tozaki M, Rodríguez-Ruiz A, et al. Artificial intelligence for
10. Pattacini P, Nitrosi A, Giorgi Rossi P, et al. Digital Mammography versus breast cancer detection in mammography: experience of use of the Screen-
Digital Mammography Plus Tomosynthesis for Breast Cancer Screen- Point Medical Transpara system in 310 Japanese women. Breast Cancer
ing: The Reggio Emilia Tomosynthesis Randomized Trial. Radiology 2020;27(4):642–651.
2018;288(2):375–385. 24. Dustler M, Dahlblom V, Tingberg A, Zackrisson S. The effect of breast
11. Caumo F, Zorzi M, Brunelli S, et al. Digital Breast Tomosynthesis with density on the performance of deep learning-based breast cancer detection
Synthesized Two-Dimensional Images versus Full-Field Digital Mammog- methods for mammography. In: Bosmans H, Marshall N, Van Ongeval C,
raphy for Population Screening: Outcomes from the Verona Screening eds. Proceedings of SPIE: 15th International Workshop on Breast Imag-
Program. Radiology 2018;287(1):37–46. ing (IWBI2020). Vol 11513. Bellingham, Wash: International Society for
12. Bernardi D, Ciatto S, Pellegrini M, et al. Application of breast tomosyn- Optics and Photonics, 2020; 1151324.
thesis in screening: incremental effect on mammography acquisition and 25. Kyono T, Gilbert FJ, van der Schaar M. Improving Workflow Efficiency
reading time. Br J Radiol 2012;85(1020):e1174–e1178. for Mammography Using Machine Learning. J Am Coll Radiol 2020;17(1
13. Aase HS, Holen ÅS, Pedersen K, et al. A randomized controlled trial of Pt A):56–63.
digital breast tomosynthesis versus digital mammography in population- 26. Dang PA, Freer PE, Humphrey KL, Halpern EF, Rafferty EA. Addi-
based screening in B ergen: interim analysis of performance indicators tion of tomosynthesis to conventional digital mammography: effect
from the To-Be trial. Eur Radiol 2019;29(3):1175–1186. on image interpretation time of screening examinations. Radiology
14. Lehman CD, Wellman RD, Buist DS, et al. Diagnostic Accuracy of Digi- 2014;270(1):49–56.
tal Screening Mammography With and Without Computer-Aided Detec-
tion. JAMA Intern Med 2015;175(11):1828–1837.