Professional Documents
Culture Documents
Radiol 223351 PDF
Radiol 223351 PDF
Background: Most low- and middle-income countries lack access to organized breast cancer screening, and women with lumps may
wait months for diagnostic assessment.
Purpose: To demonstrate that artificial intelligence (AI) software applied to breast US images obtained with low-cost portable equip-
ment and by minimally trained observers could accurately classify palpable breast masses for triage in a low-resource setting.
Materials and Methods: This prospective multicenter study evaluated participants with at least one palpable mass who were enrolled in a
hospital in Jalisco, Mexico, from December 2017 through May 2021. Orthogonal US images were obtained first with portable US
with and without calipers of any findings at the site of lump and adjacent tissue. Then women were imaged with standard-of-care
(SOC) US with Breast Imaging Reporting and Data System assessments by a radiologist. After exclusions, 758 masses in 300 women
were analyzable by AI, with outputs of benign, probably benign, suspicious, and malignant. Sensitivity, specificity, and area under the
receiver operating characteristic curve (AUC) were determined.
Results: The mean patient age ± SD was 50.0 years ± 12.5 (range, 18–92 years) and mean largest lesion diameter was 13 mm ± 8
(range, 2–54 mm). Of 758 masses, 360 (47.5%) were palpable and 56 (7.4%) malignant, including six ductal carcinoma in situ. AI
correctly identified 47 or 48 of 49 women (96%–98%) with cancer with either portable US or SOC US images, with AUCs of 0.91
and 0.95, respectively. One circumscribed invasive ductal carcinoma was classified as probably benign with SOC US, ipsilateral to a
spiculated invasive ductal carcinoma. Of 251 women with benign masses, 168 (67%) imaged with SOC US were classified as benign
or probably benign by AI, as were 96 of 251 masses (38%, P < .001) with portable US. AI performance with images obtained by a
radiologist was significantly better than with images obtained by a minimally trained observer.
Conclusion: AI applied to portable US images of breast masses can accurately identify malignancies. Moderate specificity, which could
triage 38%–67% of women with benign masses without tertiary referral, should further improve with AI and observer training with
portable US.
© RSNA, 2023
Statistical Analysis
The main purpose of this study was to assess the diagnostic ac- estimating equations models for the absolute differences between
curacy of AI applied to portable US images; we expected to show rates. For AUC, we used the numeric output from the AI soft-
at least 40% specificity. With an estimated sample size of 500 ware or categorical ordinal BI-RADS assessments given by the ra-
women and/or masses (10% malignant) and target sensitivity diologists. Evaluation of AUCs was based on the nonparametric
of at least 95%, we expected the lower limit of the 95% CI of method by DeLong et al (16). We analyzed the subsets of lesions
0.88 for sensitivity and 0.40 for specificity if we observed 45% imaged with portable US by the radiologist versus those imaged by
specificity. A sample size of 450 benign cases was estimated to minimally trained research coordinators. Statistical calculations
have greater than 80% power to detect a difference in specific- were performed with use of SAS 9.4 or R (RStudio, version
ity of 0.55 versus 0.5, allowing no more than 13% discordance 1.4.1717 [2021]).
between portable US and SOC US with use of a two-sided
McNemar test with α = .05. PASS 2021 (NCSS) was used for Results
power calculations. From December 11, 2017, through May 21, 2021, US images
Sensitivity, specificity, negative predictive value, and area were documented for 1216 breast masses, with 126 malignant
under the receiver operating characteristic curve (AUC) were deter- (10.4%), in 478 Hispanic women. After exclusions detailed in
mined. As per the guidance chapter of BI-RADS fifth edition for Figure 1, the final lesion-level analysis set included 758 masses
diagnostic breast imaging (14), a benign or probably benign assess- from 300 women, with an average age (±SD) of 50.0 years ±
ment for a malignant lesion was considered a false negative when 12.5 (range, 18–92 years). Of the 758 masses, the average largest
reporting sensitivity. Sensitivity, specificity, and negative predictive diameter was 13 mm ± 8 (range, 2–54 mm) and 360 (47.5%)
value were estimated with use of hierarchical Poisson regression were palpable; 56 of 758 (7.4%) were malignant, as were 41
with generalized estimating equations (15). The differences be- of 360 (11.4%) of the subset of palpable masses. Among the
tween modalities were compared with use of additive generalized 56 malignancies, 50 (89%) were invasive ductal carcinoma and
Table 1: Performance of AI on Breast Masses Imaged with Portable Low-Cost US and Minimally Trained Observers
versus SOC Equipment and Performance by a Specialist Radiologist
six were ductal carcinoma in situ. Benign lesions are detailed in BI-RADS 3, probably benign, by the radiologist and as-
Appendix S1. For the 300 index lesions, the average largest di- sessed as suspicious by AI.
ameter was 15 mm ± 9 (range, 2–54 mm), 167 (55.7%) were Of 702 benign masses, 554 (79%) imaged with SOC US
palpable, and 49 (16.3%) were malignant. were correctly assessed as benign or probably benign by AI, in-
cluding 43 masses that were considered suspicious (BI-RADS
Sensitivity and Specificity 4A by the radiologist) and biopsied clinically. Specificity was
Table 1 and Figure S1 (Appendix S1) detail the performance much lower for AI with use of the portable US images, at 340 of
of the AI system. At the participant level, of 49 women with 702 benign masses (48%, P < .001).
cancer, AI accurately identified 47 (96%, portable US) or 48
(98%, SOC US) and could have triaged 96 (38%, portable Operator Dependence
US) or 168 (67%, SOC US) of 251 women with benign le- Considering the subset of 204 women with portable US images
sions to routine care. At the lesion level, 53 of 56 malignan- obtained by the radiologist, there were 603 analyzable lesions
cies (95%) and up to 554 of 702 benign masses (79%) were (35 [5.8%] were malignant). The AUC of AI was 0.98, sensi-
correctly classified, with an AUC of 0.95, compared with tivity was 97%–100% (34 or 35 of 35 malignant lesions), and
radiologist AUC of 0.98 (P = .06). There were four unique specificity was 52%–80% (296–457 of 568 benign lesions) (Ta-
malignant masses (two palpable and two nonpalpable) as- ble 2). These results were significantly better (all P < .001) than
sessed as benign or probably benign by the AI software, two for portable US images obtained by minimally trained research
misclassified with both portable and SOC US, one with coordinators in the subset of 155 analyzable lesions (21 [13.5%]
portable US only, and one with SOC US only. On review, were malignant) in 96 women. The AUC of AI for this subset
each was a circumscribed, oval, hypoechoic mass, and three was 0.78, sensitivity was 86% (18 of 21 malignant lesions), and
of the four were low nuclear grade ductal carcinoma in situ specificity was 33% (44 of 134 benign lesions). Results from im-
(Fig 2). The fourth malignancy misclassified as probably be- ages obtained with portable US were generally not different from
nign by the AI was a grade 3 invasive ductal carcinoma, those with SOC equipment when distinguished by operator, ex-
which was a second, nonindex oval circumscribed mass with cept that specificity remained significantly lower with portable
internal vascularity and posterior enhancement in a woman US images (Table 2).
with a spiculated invasive ductal carcinoma elsewhere in the
same breast (Fig 3); this false-negative mass was excluded Discussion
from the participant-level analysis. Incidentally, a fifth ma- In this analysis, 47 or 48 of 49 women (96%–98%) with can-
lignancy due to ductal carcinoma in situ and resembling a cer depicted by use of US and 96–168 of 251 women (38%
cyst was lacking SOC US images for review and, therefore, –67%) without cancer would have been triaged appropriately
excluded from the final analysis set; it was misclassified as by Koios DS artificial intelligence (AI) software, with better
performance AUC of 0.876 with low-frequency transducer US in the same woman. In the American College of Radiology
images not different from an AUC of 0.893 with images from a Imaging Network 6666 protocol, multiple bilateral circum-
high-frequency transducer. scribed oval masses (assessed as a “single” overall finding)
There are other AI tools developed for breast US. Shen could be safely assessed as benign, BI-RADS 2, with 0 of 153
et al (11) developed and validated AI with stand-alone AUC such lesions malignant (95% CI: 0, 2.4) in 135 women (24).
of 0.976 on a test set of more than 44 000 examinations. Among these 135 women, 82 also had a solitary suspicious
When radiologists retrospectively reviewed images with this mass, with two of those malignant. In a patient with concur-
software, false-positive recalls decreased by 37% and benign rent malignancy, otherwise BI-RADS 3 masses overall had
biopsies by nearly 28%. S-Detect software (Samsung Medi- an 11% rate of malignancy in the series by Kim et al (13).
son) performs BI-RADS feature extraction with outputs of When in the same quadrant as the known cancer, 21.2% (36
possibly malignant or possibly benign. Inexperienced radi- of 170 masses) were malignant, as were 9.8% of ipsilateral
ologists benefit most from this approach (18), with greatest masses (12 of 122) in a different quadrant and 4.2% (eight
improvements in specificity (19–23). of 190) in the contralateral breast. Augmenting current AI to
Current AI assessments of each lesion are independent and consider concurrent breast lesions in the ipsilateral or contra-
do not consider the influence of concurrent lesions elsewhere lateral breast may improve performance.
Table 2: Comparison of AI Performance with Breast US Images Obtained by Observers with Differing Experience
There were limitations to our study. The Vscan Extend US accurate artificial intelligence (AI) classification. Although
system was never approved by the U.S. Food and Drug Admin- specificity was less than with standard-of-care equipment, AI
istration for clinical breast US work, and there was no training applied to portable breast US can potentially reduce about half
of the Koios DS algorithms with such images. The low spatial of unnecessary referrals for benign lesions in resource-limited
resolution of the transducer used, combined with the lack of regions. These favorable results were observed despite lack of
system training of the software, likely explain the reduced speci- training of the AI software with images from the device used,
ficity observed with images obtained on this portable system. and current portable US has improved specifications. We did
GE HealthCare has since updated this portable handheld plat- not show that untrained observers could produce adequately
form, and the new Vscan Air is equipped with a wireless higher diagnostic images. Additional training of affiliated health-
frequency (L3–12 MHz), wider footprint (40 mm) transducer care workers with image acquisition, improved equipment,
that is approved by the Food and Drug Administration for breast and further system training of AI software on such masses is
imaging. These system specifications are now similar to the SOC expected to further improve overall performance and allow
equipment used in American College of Radiology Imaging effective triage of women with palpable lumps in low- and
Network 6666 protocol (25) and should improve diagnostic per- middle-income countries.
formance of both the radiologist and the AI and allow better im-
aging of larger masses. There are other handheld low-cost wire- Acknowledgments: The authors are grateful to GE HealthCare for providing the
Vscan portable US systems used in acquisition of images and to Koios Medical for
less US systems with similar specifications currently available. providing the AI software.
The radiologist’s performance in this study appears artificially
high, with 100% sensitivity and specificity as high as 87%, in Author contributions: Guarantors of integrity of entire study, W.A.B., A.L.L.A.,
A.J., J.C.L.P., C.Y.G.; study concepts/study design or data acquisition or data analysis/
part because of lack of follow-up and exclusion of many masses interpretation, all authors; manuscript drafting or manuscript revision for important
assessed as negative, benign, or probably benign. Portable US intellectual content, all authors; approval of final version of submitted manuscript, all
images were not interpreted by a radiologist, so we do not know authors; agrees to ensure any questions related to the work are appropriately resolved, all
authors; literature research, W.A.B., A.L.L.A., A.J., L.H.L.; clinical studies, A.L.L.A.,
how the AI performance with those images compares to that of A.J., J.C.L.P., C.Y.G., L.H.L., M.T.S.d.L., S.L.; experimental studies, A.J., J.C.L.P.;
a radiologist. We did not include risk factors, clinical features statistical analysis, W.A.B., A.J., R.C.M., S.Y.C.; and manuscript editing, W.A.B., A.J.,
such as patient age, or findings such as skin retraction or nipple R.C.M., S.Y.C., L.H.L.
discharge. Doppler and elastography are not currently evaluated
Disclosures of conflicts of interest: W.A.B. Received grant support to the
with the AI software, and we excluded lymph nodes, skin lesions, Department of Radiology from Koios Medical for a separate study where she is
and normal tissue areas as the software has not yet been trained the principal investigator; voluntary Chief Scientific Advisor to DenseBreast-info.
on those types of findings. org. A.L.L.A. No relevant relationships. A.J. Employee of Koios Medical. J.C.L.P.
No relevant relationships. C.Y.G. No relevant relationships. R.C.M. Employee of
In conclusion, radiologists using low-cost portable hand- Koios Medical. S.Y.C. No relevant relationships. L.H.L. No relevant relationships.
held US can generate images of breast masses adequate for M.T.S.d.L. No relevant relationships. S.L. No relevant relationships.
References 14. Sickles EA, D’Orsi CJ. Follow-up and outcome monitoring. ACR BI-
1. Anderson BO, Distelhorst SR. Guidelines for International Breast Health RADS Atlas, Breast Imaging Reporting and Data System. Reston, Va:
and Cancer Control--Implementation. Introduction. Cancer 2008;113(8 American College of Radiology, 2013.
Suppl):2215–2216. 15. Sternberg MR, Hadgu A. A GEE approach to estimating sensitivity and
2. Palacio-Mejía LS, Lazcano-Ponce E, Allen-Leigh B, Hernández-Ávila specificity and coverage properties of the confidence intervals. Stat Med
M. Regional differences in breast and cervical cancer mortality in Mex- 2001;20(9-10):1529–1539.
ico between 1979-2006 [in Spanish]. Salud Publica Mex 2009;51(Suppl 16. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas
2):s208–s219. under two or more correlated receiver operating characteristic curves: a
3. Lei S, Zheng R, Zhang S, et al. Global patterns of breast cancer incidence nonparametric approach. Biometrics 1988;44(3):837–845.
and mortality: A population-based cancer registry data analysis from 2000 17. Pace LE, Dusengimana JV, Hategekimana V, et al. Clinical Diagno-
to 2020. Cancer Commun (Lond) 2021;41(11):1183–1194. ses and Outcomes After Diagnostic Breast Ultrasound by Nurses and
4. Ginsburg O, Yip CH, Brooks A, et al. Breast cancer early detec- General Practitioner Physicians in Rural Rwanda. J Am Coll Radiol
tion: A phased approach to implementation. Cancer 2020;126(Suppl 2022;19(8):983–989.
10):2379–2393. 18. Park HJ, Kim SM, La Yun B, et al. A computer-aided diagnosis system
5. Sterns EE. Age-related breast diagnosis. Can J Surg 1992;35(1):41–45. using artificial intelligence for the diagnosis and characterization of breast
6. Houssami N, Ciatto S, Irwig L, Simpson JM, Macaskill P. The compara- masses on ultrasound: Added value for the inexperienced breast radiolo-
tive sensitivity of mammography and ultrasound in women with breast gist. Medicine (Baltimore) 2019;98(3):e14146.
symptoms: an age-specific analysis. Breast 2002;11(2):125–130. 19. Choi JS, Han BK, Ko ES, et al. Effect of a Deep Learning Framework-
7. Lehman CD, Lee CI, Loving VA, Portillo MS, Peacock S, DeMartini Based Computer-Aided Diagnosis System on the Diagnostic Performance
WB. Accuracy and value of breast ultrasound for primary imaging evalu- of Radiologists in Differentiating between Malignant and Benign Masses
ation of symptomatic women 30-39 years of age. AJR Am J Roentgenol on Breast Ultrasonography. Korean J Radiol 2019;20(5):749–758.
2012;199(5):1169–1177. 20. Choi JH, Kang BJ, Baek JE, Lee HS, Kim SH. Application of comput-
8. Love SM, Berg WA, Podilchuk C, et al. Palpable Breast Lump Triage by er-aided diagnosis in breast ultrasound interpretation: improvements in
Minimally Trained Operators in Mexico Using Computer-Assisted Diag- diagnostic performance according to reader experience. Ultrasonography
nosis and Low-Cost Ultrasound. J Glob Oncol 2018;4(4):1–9. 2018;37(3):217–225.
9. Berg WA, Gur D, Bandos AI, et al. Impact of Original and Artificially Im- 21. Wu JY, Zhao ZZ, Zhang WY, et al. Computer-Aided Diagnosis of Solid
proved Artificial Intelligence–based Computer-aided Diagnosis on Breast Breast Lesions With Ultrasound: Factors Associated With False-negative
US Interpretation. J Breast Imaging 2021;3(3):301–311. and False-positive Results. J Ultrasound Med 2019;38(12):3193–3202.
10. Mango VL, Sun M, Wynn RT, Ha R. Should We Ignore, Follow, or Biopsy? 22. Kim S, Choi Y, Kim E, et al. Deep learning-based computer-aided diagno-
Impact of Artificial Intelligence Decision Support on Breast Ultrasound sis in screening breast ultrasound to reduce false-positive diagnoses. Sci Rep
Lesion Assessment. AJR Am J Roentgenol 2020;214(6):1445–1452. 2021;11(1):395.
11. Shen Y, Shamout FE, Oliver JR, et al. Artificial intelligence system reduces 23. Nicosia L, Addante F, Bozzini AC, et al. Evaluation of computer-aided
false-positive findings in the interpretation of breast ultrasound exams. diagnosis in breast ultrasonography: Improvement in diagnostic perfor-
Nat Commun 2021;12(1):5645. mance of inexperienced radiologists. Clin Imaging 2022;82:150–155.
12. Mendelson EB, Böhm-Vélez M, Berg WA, et al. ACR BI-RADS Ultra- 24. Berg WA, Zhang Z, Cormack JB, Mendelson EB. Multiple bilateral circum-
sound. ACR BI-RADS Atlas, Breast Imaging Reporting and Data System. scribed masses at screening breast US: consider annual follow-up. Radiology
Reston, Va: American College of Radiology, 2013. 2013;268(3):673–683.
13. Kim SJ, Ko EY, Shin JH, et al. Application of sonographic BI-RADS to 25. Berg WA, Blume JD, Cormack JB, et al. Combined screening with ultra-
synchronous breast nodules detected in patients with breast cancer. AJR sound and mammography vs mammography alone in women at elevated
Am J Roentgenol 2008;191(3):653–658. risk of breast cancer. JAMA 2008;299(18):2151–2163.