Professional Documents
Culture Documents
Background: Mammographic density improves the accuracy of breast cancer risk models. However, the use of breast density is lim-
ited by subjective assessment, variation across radiologists, and restricted data. A mammography-based deep learning (DL) model
may provide more accurate risk prediction.
Purpose: To develop a mammography-based DL breast cancer risk model that is more accurate than established clinical breast can-
cer risk models.
Materials and Methods: This retrospective study included 88 994 consecutive screening mammograms in 39 571 women between
January 1, 2009, and December 31, 2012. For each patient, all examinations were assigned to either training, validation, or test
sets, resulting in 71 689, 8554, and 8751 examinations, respectively. Cancer outcomes were obtained through linkage to a regional
tumor registry. By using risk factor information from patient questionnaires and electronic medical records review, three models
were developed to assess breast cancer risk within 5 years: a risk-factor-based logistic regression model (RF-LR) that used traditional
risk factors, a DL model (image-only DL) that used mammograms alone, and a hybrid DL model that used both traditional risk
factors and mammograms. Comparisons were made to an established breast cancer risk model that included breast density (Tyrer-
Cuzick model, version 8 [TC]). Model performance was compared by using areas under the receiver operating characteristic curve
(AUCs) with DeLong test (P , .05).
Results: The test set included 3937 women, aged 56.20 years 6 10.04. Hybrid DL and image-only DL showed AUCs of 0.70
(95% confidence interval [CI]: 0.66, 0.75) and 0.68 (95% CI: 0.64, 0.73), respectively. RF-LR and TC showed AUCs of 0.67
(95% CI: 0.62, 0.72) and 0.62 (95% CI: 0.57, 0.66), respectively. Hybrid DL showed a significantly higher AUC (0.70) than TC
(0.62; P , .001) and RF-LR (0.67; P = .01).
Conclusion: Deep learning models that use full-field mammograms yield substantially improved risk discrimination compared with
the Tyrer-Cuzick (version 8) model.
© RSNA, 2019
This copy is for personal use only. To order printed copies, contact reprints@rsna.org
Deep Learning Mammography-based Model for Breast Cancer Risk Prediction
Results
We generated a detailed breakdown of available risk factor in-
formation and outcomes for the training, validation, and test
sets (Tables 1 and E1 [online]). Risk factors used in TC, RF-
LR, and hybrid DL included age, weight, height, menarche
age, menopausal status, detailed family history of breast and
ovarian cancer, BRCA mutation status, history of atypical
hyperplasia, history of lobular carcinoma in situ, and breast
density. Of the 80 243 mammographic examinations used for
training and validation, 3045 (3.8%) were followed by a cancer
diagnosis within 5 years. Of the 8751 mammographic exami-
nations used for testing, 269 (3.1%) were followed by a cancer
diagnosis within 5 years.
Model Evaluation
Table 1: Patient Characteristics and Outcomes in Training, Development Validation, and Test Sets
Characteristic Data Set Cancer Data Set Cancer Data Set Cancer
All patients 71 689 (100) 2729 (3.8) 8554 (100) 316 (3.7) 8751 (100) 269 (3.1)
Age (y)
.40 2360 (3.3) 49 (2.1) 268 (3.1) 6 (2.2) 290 (3.3) 1 (0.3)
40–50 20 640 (28.8) 596 (2.9) 2464 (28.8) 73 (3.0) 2620 (29.9) 51 (1.9)
50–60 22 630 (31.6) 686 (3.0) 2750 (32.1) 87 (3.2) 2778 (31.7) 88 (3.2)
60–70 18 937 (26.4) 896 (4.7) 2247 (26.3) 96 (4.3) 2277 (26.0) 72 (3.2)
70–80 6347 (8.9) 382 (6.0) 741 (8.7) 44 (5.9) 731 (8.4) 46 (6.3)
.80 775 (1.1) 120 (15.5) 84 (1.0) 10 (11.9) 55 (0.6) 11 (20.0)
Density
Almost entirely fatty 6073 (8.5) 158 (2.6) 698 (8.2) 25 (3.6) 737 (8.4) 6 (0.8)
Scattered areas of 34 143 (47.6) 1306 (3.8) 4156 (48.6) 141 (3.4) 4126 (47.1) 117 (2.8)
fibroglandular tissue
Heterogeneously dense 27 897 (38.9) 1132 (4.1) 3240 (37.9) 131 (4.0) 3473 (39.7) 135 (3.9)
Extremely dense 3530 (4.9) 132 (3.7) 454 (5.3) 19 (4.2) 411 (4.7) 11 (2.7)
Unknown 46 (0.1) 1 (2.2) 6 (0.1) 0 (0) 4 (0) 0 (0)
Note.—Data are the number of examinations in each group; data in parentheses are percentages.
Table 2: Risk Test Set for All 5-year Risk Assessment Models
Parameter AUC
Ethnicity
White
TC 0.62 (0.57, 0.67)
RF-LR 0.66 (0.61, 0.72)
Image-only DL 0.69 (0.65, 0.74)
Hybrid DL 0.71 (0.66, 0.75)
African American
TC 0.45 (0.21, 0.66)
RF-LR 0.58 (0.33, 0.81)
Image-only DL 0.69 (0.55, 0.92)
Figure 2: Receiver operating characteristic curve of all models on Hybrid DL 0.71 (0.55, 0.89)
the test set. All P values are comparisons with Tyrer-Cuzick version 8 Note.—Data in parentheses are 95% confidence intervals. In the
(TCv8). DL = deep learning, hybrid DL = DL model that uses both im- 3157 patients who were white, there were 7107 examinations
aging and the traditional risk factors in risk factor logistic regression, and 233 cancers; in the 202 patients who were African American,
RF-LR = risk factor logistic regression. there were 424 examinations and 11 cancers. AUC = area under
receiver operator characteristic curve, DL = deep learning,
RF-LR = risk-factor-based logistic regression, TC = Tyrer-Cuzick.
Table E3 [online]). The improvement of hybrid DL over TC was
significant (P , .01). For patients without a family history of 0.77] and 0.71 [95% CI: 0.66, 0.77] respectively), and com-
breast or ovarian cancer, hybrid DL and image-only DL showed pared with an AUC of 0.66 (95% CI: 0.60, 0.73) at TC. The
similar discrimination accuracies (AUCs, 0.71 [95% CI: 0.65, improvement of hybrid DL and image-only DL over TC was
It will be important to investigate what kind of imaging Other relationships: disclosed no relevant relationships. T.P. Activities related to
the present article: MIT and MGH have patents filed on the deep learning models.
patterns hybrid DL relies on to predict cancer risk. When we Activities not related to the present article: disclosed no relevant relationships. Other
observed mammography from the cases in which hybrid DL relationships: disclosed no relevant relationships. R.B. Activities related to the pres-
and TC disagreed on the risk of a patient, we found that the ent article: MIT and MGH have patents filed on the deep learning models; dis-
closed money to author’s institution for grant from Breast Cancer Alliance, Susan G.
model was not relying on a simple density measurement to de- Komen. Activities not related to the present article: disclosed no relevant relation-
termine risk. We speculate that the model may rely on different ships. Other relationships: disclosed no relevant relationships.
fine-grain tissue patterns and relative orientations of those pat-
terns depending on global patterns in a patient’s breast, and that References
1. Gail MH, Brinton LA, Byar DP, et al. Projecting individualized probabilities of de-
there are distinguishing patterns for both women with dense and veloping breast cancer for white females who are being examined annually. J Natl
nondense breasts. Whereas methods exist (21–24) for obtaining Cancer Inst 1989;81(24):1879–1886.
2. Ross RK, Paganini-Hill A, Wan PC, Pike MC. Effect of hormone replacement ther-
saliency maps at the instance level (ie, an explanation specific to apy on breast cancer risk: estrogen versus estrogen plus progestin. J Natl Cancer Inst
an individual mammogram), further work will be required to 2000;92(4):328–332.
3. Claus EB, Risch N, Thompson WD. The calculation of breast cancer risk for
obtain the patterns that are most informative across the entire women with a first degree family history of ovarian cancer. Breast Cancer Res Treat
test set. 1993;28(2):115–120.
4. Tyrer J, Duffy SW, Cuzick J. A breast cancer prediction model incorporating familial
Our study had limitations. We used patient data from a single and personal risk factors. Stat Med 2004;23(7):1111–1130.
tertiary academic institution and mammograms captured by us- 5. Tice JA, Cummings SR, Ziv E, Kerlikowske K. Mammographic breast density and
the Gail model for breast cancer risk prediction in a screening population. Breast
ing a single vendor (Hologic). Also, some patients were missing Cancer Res Treat 2005;94(2):115–122.
risk factor information, though this limitation is common in 6. Reiner AS, Sisti J, John EM, et al. Breast cancer family history and contralateral
breast cancer risk in young women: an update from the women’s environmental
both clinical practice and previous studies (1,3–5). cancer and radiation epidemiology study. J Clin Oncol 2018;36(15):1513–1520.
In conclusion, a deep learning (DL) model that directly 7. Collins A, Politopoulos I. The genetics of breast cancer: risk factors for disease. Appl
Clin Genet 2011;4:11–19.
leverages full-field mammograms in addition to traditional 8. Brentnall AR, Harkness EF, Astley SM, et al. Mammographic density adds accuracy
risk factors outperforms the Tyrer-Cuzick model (version 8) to both the Tyrer-Cuzick and Gail breast cancer risk models in a prospective UK
screening cohort. Breast Cancer Res 2015;17(1):147.
by a large margin; this improvement is consistent across de- 9. Sprague BL, Conant EF, Onega T, et al. Variation in mammographic breast density
mographic subgroups. These results support the hypothesis assessments among radiologists in clinical practice: a multicenter observational study.
Ann Intern Med 2016;165(7):457–464.
that mammography contains informative indicators of risk 10. Lehman CD, Yala A, Schuster T, et al. Mammographic breast density assessment
not captured by traditional risk factors, and DL models can using deep learning: clinical implementation. Radiology 2019;290(1):52–58.
11. Boyd NF, Byng JW, Jong RA, et al. Quantitative classification of mammographic
deduce these patterns from the data. These models have the densities and breast cancer risk: results from the Canadian National Breast Screening
potential to replace conventional risk prediction models. Fur- Study. J Natl Cancer Inst 1995;87(9):670–675.
12. Brandt KR, Scott CG, Ma L, et al. Comparison of clinical and automated breast
ther research is required to validate our model across institu- density measurements: implications for risk prediction and supplemental screening.
tions and vendors before it can be broadly implemented, and Radiology 2016;279(3):710–719.
13. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In:
to this end, we made our trained model and code available for Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
research (learningtocure.csail.mit.edu). 2016; 770–778.
14. Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S+ to
analyze and compare ROC curves. BMC Bioinformatics 2011;12(1):77.
Acknowledgments: The authors are grateful to Thomas Schultz, BS (Partners 15. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or
Enterprise Medical Imaging, Boston, Mass), Matthew Melesky (MagView Health more correlated receiver operating characteristic curves: a nonparametric approach.
care Information Solutions), and the members of the Partners Enterprise Medical Biometrics 1988;44(3):837–845.
Imaging team for their support of this project in image and data management. 16. Field CA, Welsh AH. Bootstrapping clustered data. J R Stat Soc Series B Stat Meth-
odol 2007;69(3):369–390.
17. Boggs DA, Rosenberg L, Adams-Campbell LL, Palmer JR. Prospective approach to
Author contributions: Guarantors of integrity of entire study, A.Y., R.B.; study breast cancer risk prediction in African American women: the black women’s health
study model. J Clin Oncol 2015;33(9):1038–1044.
concepts/study design or data acquisition or data analysis/interpretation, all authors; 18. Gail MH. Twenty-five years of breast cancer risk models and their applications. J
manuscript drafting or manuscript revision for important intellectual content, all Natl Cancer Inst 2015;107(5):djv042.
authors; approval of final version of submitted manuscript, all authors; agrees to 19. Gail MH, Costantino JP, Pee D, et al. Projecting individualized absolute invasive
ensure any questions related to the work are appropriately resolved, all authors; breast cancer risk in African American women. J Natl Cancer Inst 2007;99(23):1782–
literature research, A.Y., C.L., T.P., R.B.; clinical studies, C.L.; experimental studies, 1792.
all authors; statistical analysis, A.Y., T.S., T.P.; and manuscript editing, all authors. 20. Matsuno RK, Costantino JP, Ziegler RG, et al. Projecting individualized absolute
invasive breast cancer risk in Asian and Pacific Islander American women. J Natl
Cancer Inst 2011;103(12):951–961.
Disclosures of Conflicts of Interest: A.Y. Activities related to the pres- 21. Sundararajan M, Taly A, Yan Q. Axiomatic attribution for deep networks. arXiv
ent article: MIT and MGH have patents filed on the deep learning models. preprint arXiv:1703.01365. https://arxiv.org/abs/1703.01365. Published March 4,
Activities not related to the present article: disclosed no relevant relationships. Other 2017. Accessed April 4, 2019.
relationships: disclosed no relevant relationships. C.L. Activities related to the pres- 22. Ribeiro MT, Singh S, Guestrin C. Why should I trust you?: Explaining the predic-
ent article: MIT and MGH have patents filed on the deep learning models. Activities tions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International
not related to the present article: disclosed money paid to author for consultancy from Conference on Knowledge Discovery and Data Mining, 2016; 1135–1144.
23. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: Vi-
GE Healthcare; disclosed money to author’s institution for grants/grants pending sual explanations from deep networks via gradient-based localization. In: 2017 IEEE
from GE Healthcare; disclosed that MIT and MGH have patents filed related to the International Conference on Computer Vision (ICCV), 2017; 618–626).
work. Other relationships: disclosed no relevant relationships. T.S. Activities related 24. Shrikumar A, Greenside P, Kundaje A. Learning important features through propa-
to the present article: MIT and MGH have patents filed on the deep learning models. gating activation differences. arXiv preprint arXiv:1704.02685. https://arxiv.org/
Activities not related to the present article: disclosed no relevant relationships. abs/1704.02685. Published April 10, 2017. Accessed April 4, 2019.