You are on page 1of 8

ORIGINAL RESEARCH • BREAST IMAGING

Toward AI-supported US Triage of Women with Palpable


Breast Lumps in a Low-Resource Setting
Wendie A. Berg, MD, PhD  •  Ana-Lilia López Aldrete, MD  •  Ajit Jairaj, BS  •  Juan Carlos Ledesma Parea, MD  • 
Claudia Yolanda García, MD  •  R. Chad McClennan, MBA  •  Steven Yong Cen, PhD  •  Linda H. Larsen, MD  •
M. Teresa Soler de Lara, MS  •  Susan Love, MD
From the Department of Radiology, University of Pittsburgh School of Medicine, Magee-Womens Hospital, 300 Halket St, Pittsburgh, PA 15213 (W.A.B.); Departments
of Gynecology (A.L.L.A., C.Y.G.) and Radiology (J.C.L.P.), Hospital Valentín Gómez Farias, Zapopan, Mexico; Koios Medical, New York, NY (A.J., R.C.M.); Department
of Radiology, Keck School of Medicine of USC, Los Angeles, Calif (S.Y.C., L.H.L.); and Dr Susan Love Research Foundation, West Hollywood, Calif (M.T.S.d.L., S.L.).
Received January 4, 2023; revision requested February 1; revision received February 20; accepted March 15. Address correspondence to W.A.B. (email: wendieberg@gmail.com).
Supported by the National Cancer Institute (UH3CA189966) and the Dr Susan Love Research Foundation.
Conflicts of interest are listed at the end of this article.
See also the editorial by Slanetz in this issue.
Radiology 2023; 000:e223351  •  https://doi.org/10.1148/radiol.223351  •   Content codes:

Background:  Most low- and middle-income countries lack access to organized breast cancer screening, and women with lumps may
wait months for diagnostic assessment.

Purpose:  To demonstrate that artificial intelligence (AI) software applied to breast US images obtained with low-cost portable equip-
ment and by minimally trained observers could accurately classify palpable breast masses for triage in a low-resource setting.

Materials and Methods:  This prospective multicenter study evaluated participants with at least one palpable mass who were enrolled in a
hospital in Jalisco, Mexico, from December 2017 through May 2021. Orthogonal US images were obtained first with portable US
with and without calipers of any findings at the site of lump and adjacent tissue. Then women were imaged with standard-of-care
(SOC) US with Breast Imaging Reporting and Data System assessments by a radiologist. After exclusions, 758 masses in 300 women
were analyzable by AI, with outputs of benign, probably benign, suspicious, and malignant. Sensitivity, specificity, and area under the
receiver operating characteristic curve (AUC) were determined.

Results:  The mean patient age ± SD was 50.0 years ± 12.5 (range, 18–92 years) and mean largest lesion diameter was 13 mm ± 8
(range, 2–54 mm). Of 758 masses, 360 (47.5%) were palpable and 56 (7.4%) malignant, including six ductal carcinoma in situ. AI
correctly identified 47 or 48 of 49 women (96%–98%) with cancer with either portable US or SOC US images, with AUCs of 0.91
and 0.95, respectively. One circumscribed invasive ductal carcinoma was classified as probably benign with SOC US, ipsilateral to a
spiculated invasive ductal carcinoma. Of 251 women with benign masses, 168 (67%) imaged with SOC US were classified as benign
or probably benign by AI, as were 96 of 251 masses (38%, P < .001) with portable US. AI performance with images obtained by a
radiologist was significantly better than with images obtained by a minimally trained observer.

Conclusion:  AI applied to portable US images of breast masses can accurately identify malignancies. Moderate specificity, which could
triage 38%–67% of women with benign masses without tertiary referral, should further improve with AI and observer training with
portable US.
© RSNA, 2023

Supplemental material is available for this article.

S tandards of care (SOCs) must be appropriate for the


environments where they are practiced (1). While
screening has been the focus in Western countries, low-
economic benefits of more effective, less invasive treat-
ment and improved outcomes.
The most common breast lumps (ie, cysts, fibroad-
and middle-income countries (LMIC) often lack access enomas, and cancers) are usually distinct with US, with
to organized screening programs and technology. In fewer than 10% representing malignancy in women under
LMIC, breast cancer most commonly presents as a pal- age 55 years (5). In an analysis of 3799 consecutively pre-
pable lump, and women under age 50 years are overrep- senting breast cancers in women with symptoms, US was
resented (2), with the peak age at diagnosis more than more sensitive than mammography for women under age
10 years earlier in some Asian and African countries than 48 years (6). A study of 954 women between the ages of
in the United States or Europe (3). “Early detection” is 30 and 39 years with 1208 symptomatic areas showed high
the recognition of symptomatic breast cancer at an early sensitivity of US at 95.7% (22 of 23 malignancies) and neg-
stage and is the priority of the Breast Health Global Ini- ative predictive value of 99.9%, while adjunct mammogra-
tiative in LMIC (4). Enabling early detection requires phy provided minimal additional value, helping to detect
educational efforts to increase awareness, as well as ef- only one additional malignancy; the authors concluded that
forts to improve access to imaging of breast lumps and US should be the primary method of evaluation (7).
other symptoms, pathology, and treatment. US can play US of palpable breast masses, as currently performed,
a critical role in early detection, with resulting social and requires appropriate equipment and trained professional
This copy is for personal use only. To order copies, contact reprints@rsna.org
Toward AI-supported US Triage of Women with Palpable Breast Lumps in a Low-Resource Setting

these companies retained full control of data and information


Abbreviations submitted for publication.
AI = artificial intelligence, AUC = area under the receiver operating In a prospective multicenter trial, we sought to enroll 500
characteristic curve, BI-RADS = Breast Imaging Reporting and Data
System, LMIC = low- and middle-income countries, SOC = standard women older than 18 years, each with at least one palpable
of care breast lump. From December 11, 2017, through May 21,
2021, women were recruited in an institutional review board–
Summary approved Health Insurance Portability and Accountabil-
Although specificity was less than with standard-of-care equipment, ity Act–compliant protocol and provided written, informed
artificial intelligence applied to portable breast US can potentially
reduce about half of tertiary referrals in resource-limited regions. consent at one of two centers in Mexico: Hospital Valentin
Gomez Farias in Zapopan, Jalisco, Mexico, or Hospital Gen-
Key Results eral de Tijuana in Tijuana, Mexico. The Tijuana site was closed
■ In this prospective study, artificial intelligence (AI) using standard- after enrolling nine patients, because they were not able to pro-
of-care (SOC) US images of 758 masses accurately classified 53 vide supporting data. Enrollment was suspended from May 14,
of 56 malignancies (95%) and 554 of 702 benign masses (79%),
with an area under the receiver operating characteristic curve of 2019, through December 12, 2020—first, to amend the pro-
0.95. tocol to allow minimally trained personnel to obtain images,
■ With use of low-cost portable US images obtained by a radiologist then in May 2020 due to COVID-19.
on 603 masses, AI correctly classified all 35 malignancies (100% Targeted US was performed twice on each participant. First,
sensitivity) but showed reduced specificity at 296 of 563 benign orthogonal images with and without calipers were obtained of
masses (51.2%) versus 457 of 568 benign masses (80.5%) with im-
ages from SOC US (P < .001). breast masses with use of the low-cost portable Vscan Extend
■ AI performance was reduced with images obtained with low-cost laptop–based US unit equipped with a 2.9-cm, 8.0–3.3 MHz
portable US by an untrained observer. linear-array transducer. If the area of the palpable lump appeared
to represent normal variant fatty lobulation or underlying dense
fibroglandular tissue, no images were documented. If inciden-
staff for performance and interpretation. A handheld portable tal adjacent nonpalpable breast masses were seen, they were also
US device that could provide a stand-alone, low-cost solution documented on both US systems. The first 376 women were
without requiring highly trained medical professionals would be scanned by a specialist breast imaging radiologist (J.C.L.P., with
of great utility in LMIC and in remote, rural locations with lim- 5 years of experience). The subsequent 102 women were scanned
ited resources. Preliminary evidence indicates that, with minimal by one of two nonphysician research coordinators who had been
training, ancillary medical staff (eg, nurses, medical students, trained in use of the portable US device with use of a previ-
and physicians outside of radiology) can obtain adequate breast ously validated 30-minute PowerPoint (Microsoft) presentation
US images with use of a low-cost portable US device (8). Such detailed by Love et al (8). Second, all women were also scanned
images could be interpreted remotely by a radiologist, locally by by the specialist radiologist (J.C.L.P.) with use of a high-end
artificial intelligence (AI), or both. Several studies have shown cart-based SOC device (Hi-Vision Avius, equipped with a 5-cm,
that AI can provide breast US mass classification on par with a 13–5 MHz linear-array transducer; Hitachi Medical), and or-
specialist radiologist (9,10) or even better (11). thogonal images with and without calipers were obtained.
In LMIC, most women present with either an obvious lo- For each breast mass, the radiologist specialist recorded pa-
cally advanced breast cancer or a palpable lump detected with tient age, race, and ethnicity, whether or not a given mass was the
breast self-examination or clinical breast examination. While palpable “index” lesion, its maximal diameter, and a Breast Im-
breast cancer requires referral to a hospital for treatment, the aging Reporting and Data System (BI-RADS) final assessment
vast majority (80%–90%) of palpable lumps are benign. A (12), as follows: 1, negative; 2, benign; 3, probably benign; 4A,
woman is often seen in a remote, under-resourced rural cen- low suspicion; 4B, moderate suspicion; 4C, high suspicion; or 5,
ter for a breast examination and then may wait months for highly suggestive of malignancy.
diagnostic imaging and biopsy at a referral center. If her lump
can be shown to be typically benign with US, such as a cyst US AI
or normal tissue, then this could obviate hospital referral and Deidentified paired US images from both Vscan and Hitachi
increase availability of limited resources for other women who imaging were processed by Koios DS version 3.x (Koios Medi-
actually have cancer. cal). If there were more than four images of a given lesion, all
The purpose of this study was to demonstrate that AI soft- were processed. There were no cine loops. This AI software au-
ware applied to breast US images obtained on low-cost por- tomatically segments lesions and can make use of calipers with
table equipment and also by minimally trained observers could images to confirm lesion boundaries. System-generated outputs
accurately classify palpable breast masses for triage in a low- are benign, probably benign, suspicious, and malignant (cor-
resource setting. responding to BI-RADS 1 or 2, BI-RADS 3, BI-RADS 4A or
4B, and BI-RADS 4C or 5, respectively). A numeric quantita-
Materials and Methods tive score from 0 to 1 is also output by the software, which ap-
GE HealthCare provided two Vscan US units used in this proximates relative risk of malignancy. An assessment of benign
study, and Koios Medical processed all US images without or probably benign for a malignant lesion was considered a false
human intervention. Authors who are not affiliated with negative. The AI used in this study has been trained with more

2 radiology.rsna.org  ■  Radiology: Volume 000: Number 0—Month 2023


Berg et al

than 700 000 images from 40 clinical sites, representing all major


manufacturers, including 17 different models and a wide range
of transducer frequencies, and this software is currently embed-
ded on all SOC breast US equipment from GE HealthCare and
on many picture archiving and communication systems.

Exclusions and Multiple Lesions per Participant


We excluded participants with nonbreast malignancies, incom-
plete data, data mismatch between the radiology database and
pathology reports, or missing images (some Hitachi images were
exported to discs, deleted from clinical systems, and the discs
became unrecoverable). Because AI software has not yet been
developed to assess lymph nodes, skin lesions, calcifications,
scars, or normal variants, those lesion types were excluded. Le-
sions reported to have a horizontal diameter larger than the 29-
mm horizontal field of view of the portable US transducer were
reviewed and excluded if the entire lesion could not be included
with a single image (ie, when multiple tiled images were required
to include all margins), as the AI software was not trained for this
circumstance. BI-RADS 3 masses without follow-up or with sus-
picious change noted at follow-up but no biopsy results available
were also excluded.
Because of the potential influence of multiple lesions per par-
ticipant, with higher likelihood of malignancy for other masses
depicted with help of US synchronous to current malignancy
(13), we analyzed results two ways. First, at the participant level,
we considered only a single mass per participant, prioritizing pal-
pable masses first. If there was more than one palpable mass, we
retained the malignant one. If there was more than one palpable
malignancy, we arbitrarily retained the first one listed in the da- Figure 1:  Flowchart shows study population, exclusions, and final analysis set.
tabase by site investigators. Second, we evaluated lesion-level re- Multiple lesions per participant were excluded. The palpable mass was retained.
sults. For all malignancies assessed as benign or probably benign If there were multiple palpable masses, the malignant mass was retained. If there
by AI or the radiologist, a radiologist (W.A.B., with 30 years of were multiple palpable malignancies, we arbitrarily retained the first lesion listed
by the site radiologist. BI-RADS = Breast Imaging Reporting and Data System.
experience in breast imaging) reviewed the images.

Statistical Analysis
The main purpose of this study was to assess the diagnostic ac- estimating equations models for the absolute differences between
curacy of AI applied to portable US images; we expected to show rates. For AUC, we used the numeric output from the AI soft-
at least 40% specificity. With an estimated sample size of 500 ware or categorical ordinal BI-RADS assessments given by the ra-
women and/or masses (10% malignant) and target sensitivity diologists. Evaluation of AUCs was based on the nonparametric
of at least 95%, we expected the lower limit of the 95% CI of method by DeLong et al (16). We analyzed the subsets of lesions
0.88 for sensitivity and 0.40 for specificity if we observed 45% imaged with portable US by the radiologist versus those imaged by
specificity. A sample size of 450 benign cases was estimated to minimally trained research coordinators. Statistical calculations
have greater than 80% power to detect a difference in specific- were performed with use of SAS 9.4 or R (RStudio, version
ity of 0.55 versus 0.5, allowing no more than 13% discordance 1.4.1717 [2021]).
between portable US and SOC US with use of a two-sided
McNemar test with α = .05. PASS 2021 (NCSS) was used for Results
power calculations. From December 11, 2017, through May 21, 2021, US images
Sensitivity, specificity, negative predictive value, and area were documented for 1216 breast masses, with 126 malignant
under the receiver operating characteristic curve (AUC) were deter- (10.4%), in 478 Hispanic women. After exclusions detailed in
mined. As per the guidance chapter of BI-RADS fifth edition for Figure 1, the final lesion-level analysis set included 758 masses
diagnostic breast imaging (14), a benign or probably benign assess- from 300 women, with an average age (±SD) of 50.0 years ±
ment for a malignant lesion was considered a false negative when 12.5 (range, 18–92 years). Of the 758 masses, the average largest
reporting sensitivity. Sensitivity, specificity, and negative predictive diameter was 13 mm ± 8 (range, 2–54 mm) and 360 (47.5%)
value were estimated with use of hierarchical Poisson regression were palpable; 56 of 758 (7.4%) were malignant, as were 41
with generalized estimating equations (15). The differences be- of 360 (11.4%) of the subset of palpable masses. Among the
tween modalities were compared with use of additive generalized 56 malignancies, 50 (89%) were invasive ductal carcinoma and

Radiology: Volume 000: Number 0—Month 2023  ■  radiology.rsna.org 3


Toward AI-supported US Triage of Women with Palpable Breast Lumps in a Low-Resource Setting

Table 1: Performance of AI on Breast Masses Imaged with Portable Low-Cost US and Minimally Trained Observers
versus SOC Equipment and Performance by a Specialist Radiologist

Performance Metric AI Portable US* AI SOC US* Difference Radiologist BI-RADS


Participant-level
analysis (300 index
masses in 300 women)
 AUC 0.91 (0.85, 0.97) 0.95 (0.92, 0.98) −0.04 (−0.08, 0), P = .06 0.96 (0.94, 0.98)
 Sensitivity 0.96 [47/49] (0.90, 1) 0.98 [48/49] (0.94, 1) −0.02 (−0.06, 0.02), P = .31 1 [49/49] (1, 1)
 Specificity 0.38 [96/251] (0.32, 0.44) 0.67 [168/251] (0.61, 0.73) −0.29 (−0.35, −0.22), P < .001 0.68 [170/251] (0.62, 0.74)
 NPV 0.98 [96/98] (0.95, 1) 0.99 [168/169] (0.98, 1) −0.01 (−0.04, 0.01), P = .19 1 [170/170] (1, 1)
Lesion-level analysis
(758 masses in 300
women)
 AUC 0.92 (0.87, 0.97) 0.95 (0.91, 0.99) −0.03 (−0.07, 0.01), P = .10 0.98 (0.97, 0.99)
 Sensitivity 0.95 [53/56] (0.89, 1) 0.95 [53/56] (0.89, 1) 0 (−0.05, 0.05), P = 1 1 [56/56] (1, 1)
 Specificity 0.48 [340/702] (0.44, 0.53) 0.79 [554/702] (0.76, 0.82) −0.3 (−0.35, −0.26), P < .001 0.87 [612/702] (0.84, 0.9)
 NPV 0.99 [340/343] (0.98, 1) 0.99 [554/557] (0.99, 1) 0 (−0.01, 0), P = .37 1 [612/612] (1, 1)
Note.—Data in brackets are numbers of participants or lesions (as noted), and data in parentheses are 95% CIs. AI = artificial intelligence,
AUC = area under the receiver operating characteristic curve, BI-RADS = Breast Imaging Reporting and Data System, NPV = negative
predictive value, SOC = standard of care.
* Portable US was performed by either a specialist radiologist or minimally trained observers with use of a Vscan Extend unit equipped with
a 3.3–8.0 MHz linear-array transducer (GE HealthCare); SOC US was performed by a radiologist specializing in breast imaging with use of
an Aloka system equipped with an 8.0–13 MHz linear-array transducer (Hitachi Medical).

six were ductal carcinoma in situ. Benign lesions are detailed in BI-RADS 3, probably benign, by the radiologist and as-
Appendix S1. For the 300 index lesions, the average largest di- sessed as suspicious by AI.
ameter was 15 mm ± 9 (range, 2–54 mm), 167 (55.7%) were Of 702 benign masses, 554 (79%) imaged with SOC US
palpable, and 49 (16.3%) were malignant. were correctly assessed as benign or probably benign by AI, in-
cluding 43 masses that were considered suspicious (BI-RADS
Sensitivity and Specificity 4A by the radiologist) and biopsied clinically. Specificity was
Table 1 and Figure S1 (Appendix S1) detail the performance much lower for AI with use of the portable US images, at 340 of
of the AI system. At the participant level, of 49 women with 702 benign masses (48%, P < .001).
cancer, AI accurately identified 47 (96%, portable US) or 48
(98%, SOC US) and could have triaged 96 (38%, portable Operator Dependence
US) or 168 (67%, SOC US) of 251 women with benign le- Considering the subset of 204 women with portable US images
sions to routine care. At the lesion level, 53 of 56 malignan- obtained by the radiologist, there were 603 analyzable lesions
cies (95%) and up to 554 of 702 benign masses (79%) were (35 [5.8%] were malignant). The AUC of AI was 0.98, sensi-
correctly classified, with an AUC of 0.95, compared with tivity was 97%–100% (34 or 35 of 35 malignant lesions), and
radiologist AUC of 0.98 (P = .06). There were four unique specificity was 52%–80% (296–457 of 568 benign lesions) (Ta-
malignant masses (two palpable and two nonpalpable) as- ble 2). These results were significantly better (all P < .001) than
sessed as benign or probably benign by the AI software, two for portable US images obtained by minimally trained research
misclassified with both portable and SOC US, one with coordinators in the subset of 155 analyzable lesions (21 [13.5%]
portable US only, and one with SOC US only. On review, were malignant) in 96 women. The AUC of AI for this subset
each was a circumscribed, oval, hypoechoic mass, and three was 0.78, sensitivity was 86% (18 of 21 malignant lesions), and
of the four were low nuclear grade ductal carcinoma in situ specificity was 33% (44 of 134 benign lesions). Results from im-
(Fig 2). The fourth malignancy misclassified as probably be- ages obtained with portable US were generally not different from
nign by the AI was a grade 3 invasive ductal carcinoma, those with SOC equipment when distinguished by operator, ex-
which was a second, nonindex oval circumscribed mass with cept that specificity remained significantly lower with portable
internal vascularity and posterior enhancement in a woman US images (Table 2).
with a spiculated invasive ductal carcinoma elsewhere in the
same breast (Fig 3); this false-negative mass was excluded Discussion
from the participant-level analysis. Incidentally, a fifth ma- In this analysis, 47 or 48 of 49 women (96%–98%) with can-
lignancy due to ductal carcinoma in situ and resembling a cer depicted by use of US and 96–168 of 251 women (38%­
cyst was lacking SOC US images for review and, therefore, –67%) without cancer would have been triaged appropriately
excluded from the final analysis set; it was misclassified as by Koios DS artificial intelligence (AI) software, with better

4 radiology.rsna.org  ■  Radiology: Volume 000: Number 0—Month 2023


Berg et al

ductal carcinoma was a circumscribed


mass, ipsilateral to a second spiculated
mass due to invasive ductal carcinoma
and showed strong internal vascularity.
Doppler results are not currently con-
sidered by AI. The otherwise outstand-
ing performance of the software on
invasive breast cancer, even with low-
quality images, is highly reassuring.
The current inability to easily tri-
age women with breast lumps ap-
pears to be a generalizable challenge
in LMICs, with wait times as long
as 6 months for evaluation in some
settings. An accurate noninvasive
portable US device that could be
used locally by a minimally trained
healthcare worker, concurrent with
a clinical breast examination con-
firming the lump, would ensure that
the 10%–20% of women with po-
tentially malignant lesions could be
referred urgently for biopsy while a
high percentage of women with be-
nign lesions could be safely reassured.
Such an approach would ensure that
limited trained healthcare personnel
and financial resources are focused
on those women who would benefit
Figure 2:  Images in a 37-year-old woman show a palpable mass due to low-grade ductal carcinoma in situ. from earlier management and treat-
(A) Orthogonal portable US images show a hypoechoic oval mass with subtly indistinct margins (arrows), assessed ment. With use of the handheld
as probably benign by artificial intelligence (AI). (B) Orthogonal standard-of-care US images show focal microlobu- portable US images in conjunction
lation (arrow) and were assessed as suspicious with AI and as Breast Imaging Reporting and Data System 4A by the
with AI, in this study, 33%–52%
radiologist. US-guided core biopsy and excision showed estrogen and progesterone receptor positive low-grade
ductal carcinoma in situ. of women with benign breast lumps
corresponding to masses with US
could have been triaged remotely,
performance with standard-of-care (SOC) US equipment than with negative predictive values of 93.6%–100%, with better
with low-cost portable US. By analyzing 758 breast masses in performance with images obtained by the radiologist and with
women with findings seen with US, we found AI in isolation images from SOC equipment. Use of AI is a much more eas-
performed well, with the area under the receiver operating char- ily implemented approach than extensive training with breast
acteristic curve (AUC) exceeding 0.95 with images obtained US, such as the 7-week training of nurses and general practi-
with SOC US equipment and sensitivity of 95% or higher. tioners in Rwanda to include interpretation (17). We did find,
Importantly, nearly equivalent AUC and sensitivity, at 95% or however, that images obtained by minimally trained observers
higher, but lower specificity, was observed for AI applied to were not optimal for diagnostic assessment—greater emphasis
images obtained by a radiologist with low-cost portable US. AI on US technique is needed. It was reassuring that a specialist
applied to images obtained by minimally trained research co- could obtain equally diagnostic images even with a low-cost
ordinators with low-cost portable US showed suboptimal per- portable US device as with a high-end cart-based system.
formance, indicating the need for greater training of personnel. The AI used in this study, Koios DS, is intended as decision
Training of the AI software with images from low-cost portable support to a radiologist and, as stated, has been trained and
US should improve specificity. validated with more than 700 000 breast masses from over 40
The few malignancies erroneously assessed as benign or prob- centers, representing 17 current high-end breast US platforms
ably benign by AI in our study did have benign features, and from all major US manufacturers. The AI stand-alone perfor-
all but one were ductal carcinoma in situ. Delay in diagnosis of mance has previously been shown to be similar to that of spe-
low-grade ductal carcinoma in situ is unlikely to adversely affect cialist radiologists (9,10), with an AUC of 0.77 on a relatively
patient outcomes. The AI software has not been trained on a suf- challenging case set of 319 lesions found with screening US (9).
ficient volume of US masses due to ductal carcinoma in situ— Mango et al (10) showed the greatest beneficial impact of AI on
this is an area for improvement. The one misclassified invasive physicians outside of radiology on 900 cases, with a stand-alone

Radiology: Volume 000: Number 0—Month 2023  ■  radiology.rsna.org 5


Toward AI-supported US Triage of Women with Palpable Breast Lumps in a Low-Resource Setting

Figure 3:  Images in a 60-year-old woman


show two palpable masses in the outer right breast.
(A) Orthogonal portable US images of mass in right
breast at 8 o’clock axis, 4 cm from the nipple, as-
sessed as suspicious by artificial intelligence (AI).
(B) Orthogonal standard-of-care (SOC) US images
of the same 19-mm mass show circumscribed margins
and posterior enhancement. SOC images of the mass
were assessed as probably benign by AI and Breast
Imaging Reporting and Data System (BI-RADS) 4A,
low suspicion, by the radiologist. (C) Color Doppler
US image shows strong internal vascularity. Doppler
images are not currently evaluated by AI. (D) Or-
thogonal US images of second palpable mass in right
breast at 8 o’clock axis, 6 cm from the nipple, show an
irregular 17-mm hypoechoic spiculated mass with pos-
terior shadowing, assessed as probably malignant by
AI and BI-RADS 5 by the radiologist. Histopathologic
examination of both masses showed grade 3 invasive
ductal carcinoma that was estrogen and progesterone
receptor positive and human epidermal growth factor
receptor 2 negative.

performance AUC of 0.876 with low-frequency transducer US in the same woman. In the American College of Radiology
images not different from an AUC of 0.893 with images from a Imaging Network 6666 protocol, multiple bilateral circum-
high-frequency transducer. scribed oval masses (assessed as a “single” overall finding)
There are other AI tools developed for breast US. Shen could be safely assessed as benign, BI-RADS 2, with 0 of 153
et al (11) developed and validated AI with stand-alone AUC such lesions malignant (95% CI: 0, 2.4) in 135 women (24).
of 0.976 on a test set of more than 44 000 examinations. Among these 135 women, 82 also had a solitary suspicious
When radiologists retrospectively reviewed images with this mass, with two of those malignant. In a patient with concur-
software, false-positive recalls decreased by 37% and benign rent malignancy, otherwise BI-RADS 3 masses overall had
biopsies by nearly 28%. S-Detect software (Samsung Medi- an 11% rate of malignancy in the series by Kim et  al (13).
son) performs BI-RADS feature extraction with outputs of When in the same quadrant as the known cancer, 21.2% (36
possibly malignant or possibly benign. Inexperienced radi- of 170 masses) were malignant, as were 9.8% of ipsilateral
ologists benefit most from this approach (18), with greatest masses (12 of 122) in a different quadrant and 4.2% (eight
improvements in specificity (19–23). of 190) in the contralateral breast. Augmenting current AI to
Current AI assessments of each lesion are independent and consider concurrent breast lesions in the ipsilateral or contra-
do not consider the influence of concurrent lesions elsewhere lateral breast may improve performance.

6 radiology.rsna.org  ■  Radiology: Volume 000: Number 0—Month 2023


Berg et al

Table 2: Comparison of AI Performance with Breast US Images Obtained by Observers with Differing Experience

Performance Metric AI Portable US* AI SOC US* Difference Radiologist BI-RADS


Radiologist-performed
US (603 masses in
204 women)
 AUC 0.98 (0.95, 1.0) 0.98 (0.96, 1.0) 0 (−0.02, 0.01), P = .77 0.99 (0.99, 1.0)
 Sensitivity 1.0 [35/35] (1, 1.0) 0.97 [34/35] (0.92, 1) 0.03 (−0.03, 0.08), P = .3 1.0 [35/35] (1.0, 1.0)
 Specificity 0.52 [296/568] (0.47, 0.57) 0.80 [457/568] (0.77, 0.84) −0.28 (−0.33, −0.23), P < .001 0.94 [532/568] (0.91, 0.96)
 NPV 1.0 [296/296] (1, 1.0) 1.0 [457/458] (0.99, 1.0) 0 (0, 0.01), P = .32 1.0 [532/532] (1, 1.0)
Minimally trained
observer–performed
portable US
(155 masses in
96 women)*
 AUC 0.78 (0.64, 0.91) 0.89 (0.79, 0.98) −0.11 (−0.21, 0), P = .04 0.91 (0.86, 0.97)
 Sensitivity 0.86 [18/21] (0.71, 1) 0.90 [19/21] (0.78, 1) −0.05 (−0.14, 0.04), P = .31 1.0 [21/21] (1.0, 1.0)
 Specificity 0.33 [44/134] (0.24, 0.42) 0.72 [97/134] (0.64, 0.81) −0.4 (−0.48, −0.31), P < .001 0.60 [80/134] (0.46, 0.74)
 NPV 0.94 [44/47] (0.86, 1) 0.98 [97/99] (0.95, 1) −0.04 (−0.1, 0.01), P = .11 1 [80/80] (1, 1)
Note.—Data in brackets are numbers of masses, and data in parentheses are 95% CIs. AI = artificial intelligence, AUC = area under the
receiver operating characteristic curve, BI-RADS = Breast Imaging Reporting and Data System, NPV = negative predictive value, SOC =
standard of care.
* Portable US was performed by either a specialist radiologist or minimally trained observers with use of a Vscan Extend unit equipped with
a 3.3–8.0 MHz linear-array transducer (GE HealthCare); SOC US was performed by a radiologist specializing in breast imaging with use of
an Aloka system equipped with an 8.0–13 MHz linear-array transducer (Hitachi Medical).

There were limitations to our study. The Vscan Extend US accurate artificial intelligence (AI) classification. Although
system was never approved by the U.S. Food and Drug Admin- specificity was less than with standard-of-care equipment, AI
istration for clinical breast US work, and there was no training applied to portable breast US can potentially reduce about half
of the Koios DS algorithms with such images. The low spatial of unnecessary referrals for benign lesions in resource-limited
resolution of the transducer used, combined with the lack of regions. These favorable results were observed despite lack of
system training of the software, likely explain the reduced speci- training of the AI software with images from the device used,
ficity observed with images obtained on this portable system. and current portable US has improved specifications. We did
GE HealthCare has since updated this portable handheld plat- not show that untrained observers could produce adequately
form, and the new Vscan Air is equipped with a wireless higher diagnostic images. Additional training of affiliated health-
frequency (L3–12 MHz), wider footprint (40 mm) transducer care workers with image acquisition, improved equipment,
that is approved by the Food and Drug Administration for breast and further system training of AI software on such masses is
imaging. These system specifications are now similar to the SOC expected to further improve overall performance and allow
equipment used in American College of Radiology Imaging effective triage of women with palpable lumps in low- and
Network 6666 protocol (25) and should improve diagnostic per- middle-income countries.
formance of both the radiologist and the AI and allow better im-
aging of larger masses. There are other handheld low-cost wire- Acknowledgments: The authors are grateful to GE HealthCare for providing the
Vscan portable US systems used in acquisition of images and to Koios Medical for
less US systems with similar specifications currently available. providing the AI software.
The radiologist’s performance in this study appears artificially
high, with 100% sensitivity and specificity as high as 87%, in Author contributions: Guarantors of integrity of entire study, W.A.B., A.L.L.A.,
A.J., J.C.L.P., C.Y.G.; study concepts/study design or data acquisition or data analysis/
part because of lack of follow-up and exclusion of many masses interpretation, all authors; manuscript drafting or manuscript revision for important
assessed as negative, benign, or probably benign. Portable US intellectual content, all authors; approval of final version of submitted manuscript, all
images were not interpreted by a radiologist, so we do not know authors; agrees to ensure any questions related to the work are appropriately resolved, all
authors; literature research, W.A.B., A.L.L.A., A.J., L.H.L.; clinical studies, A.L.L.A.,
how the AI performance with those images compares to that of A.J., J.C.L.P., C.Y.G., L.H.L., M.T.S.d.L., S.L.; experimental studies, A.J., J.C.L.P.;
a radiologist. We did not include risk factors, clinical features statistical analysis, W.A.B., A.J., R.C.M., S.Y.C.; and manuscript editing, W.A.B., A.J.,
such as patient age, or findings such as skin retraction or nipple R.C.M., S.Y.C., L.H.L.
discharge. Doppler and elastography are not currently evaluated
Disclosures of conflicts of interest: W.A.B. Received grant support to the
with the AI software, and we excluded lymph nodes, skin lesions, Department of Radiology from Koios Medical for a separate study where she is
and normal tissue areas as the software has not yet been trained the principal investigator; voluntary Chief Scientific Advisor to DenseBreast-info.
on those types of findings. org. A.L.L.A. No relevant relationships. A.J. Employee of Koios Medical. J.C.L.P.
No relevant relationships. C.Y.G. No relevant relationships. R.C.M. Employee of
In conclusion, radiologists using low-cost portable hand- Koios Medical. S.Y.C. No relevant relationships. L.H.L. No relevant relationships.
held US can generate images of breast masses adequate for M.T.S.d.L. No relevant relationships. S.L. No relevant relationships.

Radiology: Volume 000: Number 0—Month 2023  ■  radiology.rsna.org 7


Toward AI-supported US Triage of Women with Palpable Breast Lumps in a Low-Resource Setting

References 14. Sickles EA, D’Orsi CJ. Follow-up and outcome monitoring. ACR BI-
1. Anderson BO, Distelhorst SR. Guidelines for International Breast Health RADS Atlas, Breast Imaging Reporting and Data System. Reston, Va:
and Cancer Control--Implementation. Introduction. Cancer 2008;113(8 American College of Radiology, 2013.
Suppl):2215–2216. 15. Sternberg MR, Hadgu A. A GEE approach to estimating sensitivity and
2. Palacio-Mejía LS, Lazcano-Ponce E, Allen-Leigh B, Hernández-Ávila specificity and coverage properties of the confidence intervals. Stat Med
M. Regional differences in breast and cervical cancer mortality in Mex- 2001;20(9-10):1529–1539.
ico between 1979-2006 [in Spanish]. Salud Publica Mex 2009;51(Suppl 16. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas
2):s208–s219. under two or more correlated receiver operating characteristic curves: a
3. Lei S, Zheng R, Zhang S, et al. Global patterns of breast cancer incidence nonparametric approach. Biometrics 1988;44(3):837–845.
and mortality: A population-based cancer registry data analysis from 2000 17. Pace LE, Dusengimana JV, Hategekimana V, et  al. Clinical Diagno-
to 2020. Cancer Commun (Lond) 2021;41(11):1183–1194. ses and Outcomes After Diagnostic Breast Ultrasound by Nurses and
4. Ginsburg O, Yip CH, Brooks A, et  al. Breast cancer early detec- General Practitioner Physicians in Rural Rwanda. J Am Coll Radiol
tion: A phased approach to implementation. Cancer 2020;126(Suppl 2022;19(8):983–989.
10):2379–2393. 18. Park HJ, Kim SM, La Yun B, et al. A computer-aided diagnosis system
5. Sterns EE. Age-related breast diagnosis. Can J Surg 1992;35(1):41–45. using artificial intelligence for the diagnosis and characterization of breast
6. Houssami N, Ciatto S, Irwig L, Simpson JM, Macaskill P. The compara- masses on ultrasound: Added value for the inexperienced breast radiolo-
tive sensitivity of mammography and ultrasound in women with breast gist. Medicine (Baltimore) 2019;98(3):e14146.
symptoms: an age-specific analysis. Breast 2002;11(2):125–130. 19. Choi JS, Han BK, Ko ES, et al. Effect of a Deep Learning Framework-
7. Lehman CD, Lee CI, Loving VA, Portillo MS, Peacock S, DeMartini Based Computer-Aided Diagnosis System on the Diagnostic Performance
WB. Accuracy and value of breast ultrasound for primary imaging evalu- of Radiologists in Differentiating between Malignant and Benign Masses
ation of symptomatic women 30-39 years of age. AJR Am J Roentgenol on Breast Ultrasonography. Korean J Radiol 2019;20(5):749–758.
2012;199(5):1169–1177. 20. Choi JH, Kang BJ, Baek JE, Lee HS, Kim SH. Application of comput-
8. Love SM, Berg WA, Podilchuk C, et al. Palpable Breast Lump Triage by er-aided diagnosis in breast ultrasound interpretation: improvements in
Minimally Trained Operators in Mexico Using Computer-Assisted Diag- diagnostic performance according to reader experience. Ultrasonography
nosis and Low-Cost Ultrasound. J Glob Oncol 2018;4(4):1–9. 2018;37(3):217–225.
9. Berg WA, Gur D, Bandos AI, et al. Impact of Original and Artificially Im- 21. Wu JY, Zhao ZZ, Zhang WY, et al. Computer-Aided Diagnosis of Solid
proved Artificial Intelligence–based Computer-aided Diagnosis on Breast Breast Lesions With Ultrasound: Factors Associated With False-negative
US Interpretation. J Breast Imaging 2021;3(3):301–311. and False-positive Results. J Ultrasound Med 2019;38(12):3193–3202.
10. Mango VL, Sun M, Wynn RT, Ha R. Should We Ignore, Follow, or Biopsy? 22. Kim S, Choi Y, Kim E, et al. Deep learning-based computer-aided diagno-
Impact of Artificial Intelligence Decision Support on Breast Ultrasound sis in screening breast ultrasound to reduce false-positive diagnoses. Sci Rep
Lesion Assessment. AJR Am J Roentgenol 2020;214(6):1445–1452. 2021;11(1):395.
11. Shen Y, Shamout FE, Oliver JR, et al. Artificial intelligence system reduces 23. Nicosia L, Addante F, Bozzini AC, et  al. Evaluation of computer-aided
false-positive findings in the interpretation of breast ultrasound exams. diagnosis in breast ultrasonography: Improvement in diagnostic perfor-
Nat Commun 2021;12(1):5645. mance of inexperienced radiologists. Clin Imaging 2022;82:150–155.
12. Mendelson EB, Böhm-Vélez M, Berg WA, et al. ACR BI-RADS Ultra- 24. Berg WA, Zhang Z, Cormack JB, Mendelson EB. Multiple bilateral circum-
sound. ACR BI-RADS Atlas, Breast Imaging Reporting and Data System. scribed masses at screening breast US: consider annual follow-up. Radiology
Reston, Va: American College of Radiology, 2013. 2013;268(3):673–683.
13. Kim SJ, Ko EY, Shin JH, et al. Application of sonographic BI-RADS to 25. Berg WA, Blume JD, Cormack JB, et al. Combined screening with ultra-
synchronous breast nodules detected in patients with breast cancer. AJR sound and mammography vs mammography alone in women at elevated
Am J Roentgenol 2008;191(3):653–658. risk of breast cancer. JAMA 2008;299(18):2151–2163.

8 radiology.rsna.org  ■  Radiology: Volume 000: Number 0—Month 2023

You might also like