You are on page 1of 4

Deep Understanding of Breast Density Classification

Timothy Cogan1 and Lakshman Tamil2

Abstract— We have developed a deep learning architecture, by averaging the area under curve for differentiating class
DualViewNet, for mammogram density classification as well 1 versus all 3 other classes, class 2 versus all 3 other
as a novel metric for quantifying network preference of classes, et cetera [7]. Mohamed et al. developed networks
mediolateral oblique (MLO) versus craniocaudal (CC) views
in density classification. Also, we have provided thorough based on AlexNet and compared classification performance
analysis and visualization to better understand the behavior of on mediolateral oblique versus craniocaudal mammograms.
deep neural networks in density classification. Our proposed They achieved 0.95 area under curve for mediolateral oblique
architecture, DualViewNet, simultaneously examines and mammograms and 0.88 area under curve for craniocaudal
classifies both MLO and CC views corresponding to the same mammograms when differentiating sparsely dense and het-
breast, and shows best performance with a macro average
AUC of 0.8970 and macro average 95% confidence interval erogenously dense mammograms. Also, they achieved 0.97
of 0.8239-0.9450 obtained via bootstrapping 1000 test sets. area under curve for mediolateral oblique mammograms and
By leveraging DualViewNet we provide a novel algorithm 0.92 area under curve for craniocaudal mammograms when
and quantitative comparison of MLO versus CC views for differentiating dense (heterogeneously plus extremely dense)
classification and find that MLO provides stronger influence and non-dense (fatty plus sparsely dense) mammograms [8].
in 1,187 out of 1,323 breasts.
Another study by Mohamed et al. compared the effects of
This work provides insight into applying deep learning for pre-training AlexNet with ImageNet images versus training
breast density classification. from scratch. When training with 2,000 mammograms, a
slightly higher area under curve of 0.9223 versus 0.9201
I. I NTRODUCTION
was achieved with the pre-trained network versus the from
Recent advancements in deep learning for computer vision scratch network, but when training with 6,000 mammograms
have led to state-of-the-art classifiers on par with or better these values became 0.9256 and 0.9455, respectively. The
than human performance in certain areas. Because of these authors concluded that there is not a significant gain from
recent advancements, there has been growing interest in AI- pretraining [9]. Lee et al. used VGG16-based networks for
assisted medical image analysis [1], including mammogram image segmentation followed by percent density estimation.
analysis [2]. With respect to mammogram analysis, breast In this study, a first network classified each pixel as breast
density classification is important due to association between or non-breast while a second network classified each breast
density and risk of breast cancer. Typically, fatty breasts are pixel as dense or not. The ratio between total breast and
at lower risk for cancer while dense breasts are at higher risk dense pixels was used to approximate percent density. The
[3], [4]. The Breast Imaging-Reporting and Data System (BI- percent density estimates showed a Pearson’s correlation
RADS) categorizes breasts into four density classes: fatty, of 0.85 with BI-RADS categories [10]. Gandomkar et al.
sparsely dense, heterogeneously dense, and extremely dense. evaluated an Inception-v3 network pretrained on ImageNet
Sometimes, density will be discussed with binary labels data. The network was trained on 3,813 mammograms and
of low density (fatty and sparsely dense) and high density then evaluated on 150 mammograms. For classifying fatty
(heterogeneously and extremely dense) [5], [6]. versus dense breasts (classes 1 and 2 versus 3 and 4), this
In a recent study Wu et al. used over 200,000 screening network achieved 92.0% accuracy [11]. Lehman et al. de-
exams in developing a convolutional neural network for veloped a deep learning network and tested the acceptability
mammogram density classification. This network accepts 4 of this model in a clinical setting. The network was based
mammograms, craniocaudal and mediolateral oblique for on ResNet-18, trained on 41,479 mammograms, and tested
left and right breasts, as inputs, and each mammogram on 8,677 mammograms. Furthermore, 10,763 mammograms
is processed by separate convolutional and pooling layers. were compared against assessments by five different breast
The feature sets from all 4 images are concatenated and imagers in a real-time clinical evaluation. Of these 10,763
passed to a fully connected layer for classification. The classifications, the deep learning model and interpreting
network achieved a macro average area under curve of radiologist agreed on 9,729 classifications [6].
0.934 where the macro average area under curve is defined
II. M ETHODS
*This work was not supported by any organization A. Dataset
1 Timothy Cogan is with the Department of Electrical and Computer
Engineering, The University of Texas at Dallas, Richardson, TX 75080, Our training, validation, and test mammograms were ob-
USA timothy.cogan@utdallas.edu tained from the Curated Breast Imaging Subset of the Digital
2 Lakshman Tamil is with Faculty of the Department of Electrical and
Computer Engineering, The University of Texas at Dallas, Richardson, TX Database for Screening Mammography (CBIS-DDSM). The
75080, USA laxman@utdallas.edu CBIS-DDSM, as the name describes, is a curated subset

978-1-7281-1990-8/20/$31.00 ©2020 IEEE 1140

Authorized licensed use limited to: University of Durham. Downloaded on September 27,2020 at 03:33:52 UTC from IEEE Xplore. Restrictions apply.
batch size optimizer learning rate momentum
of the Digital Database for Screening Mammography, a
dataset of mammograms provided by the University of South 8 SGD 0.001 0.9
Florida. Mammograms within the CBIS-DDSM have been TABLE I
selected by a trained mammographer, converted to DICOM T RAINING PARAMETERS USED FOR M OBILE N ET V2 AND
format, include updated metadata, and have defined splits for D UALV IEW N ET
training and testing [12], [13], [14].

B. Preprocessing
We applied a series of preprocessing techniques to the
mammograms prior to training and testing. For both train-
ing and testing, mammograms were colormapped from 16-
bit greyscale to 24-bit RGB via the magma colormapping
scheme. We postulated that MobileNetV2 layers would per-
form better on colormapped mammograms since these layers
were pretrained on RGB images from ImageNet, and we
used magma colormapping because it is perceptually uniform
[15], [2]. For validation and testing, images were then resized
and center cropped to 336x224. MobileNetV2 was pretrained
on 224x224 images, but we used an input size of 336x224 Fig. 1. DualViewNet architecture for classifying MLO and CC views
to accomodate the typical aspect ratio of mammograms. simultaneously. The MLO and CC images are passed through separate
For training, images were cropped to a random size and convolutional layers, but the combined features from both images are used
for classification.
aspect ratio, resized to 336x224 and then randomly flipped
horizontally to provide dataset augmentation for reducing
overfit. Lastly, mammograms pixel values were normalized the MLO and CC mammograms to separate convolutional
to a mean of [0.485, 0.456, 0.406] and standard deviation of layers based on MobileNetV2, although convolutional layers
[0.229, 0.224, 0.225] as was done during pretraining [16]. from any state-of-the-art image classifier could be used. Two
C. MobileNetV2 sets of features output from the convolutional layers are
then concatenated and passed into a classifier which outputs
MobileNetV2 is a deep neural network which has proven probabilities for each of the density classes. For a visual
effective in a variety of image classification tasks. It was representation, the DualViewNet architecture can be seen
developed to provide high performance at fast speeds via a in figure 1. Both networks were developed using PyTorch
reduced set of parameters as compared to other deep neural [18], and training parameters for both MobileNetV2 and
networks typically used in image classification. Although DualViewNet can be seen in table I.
classification speed is not necessarily important in density
classification, we selected the MobileNetV2 architecture for III. R ESULTS
density classification due to our limited dataset of mam-
Receiver operating characteristics (ROC) for MobileNetV2
mograms. Because MobileNetV2 has fewer parameters than
and DualViewNet can be seen in figures 2 and 3. Mo-
other networks, we hypothesized that it is less likely to overfit
bileNetV2 was trained and tested in three separate conditions
on smaller datasets such as ours. As a reference, whereas
of only MLO mammograms, only CC mammograms, and
a MobileNetV2 implementation can have as few as 2.11
both MLO and CC mammograms. Under these three con-
million parameters, ResNet-101 has 58.16 million parameters
ditions, MobileNetV2 achieved a macro average area under
[17].
curve (AUC) of 0.8839 with macro average 95% confidence
D. DualViewNet interval of 0.8188-0.9303, AUC of 0.8794 with confidence
We present a network architecture which we call Du-
alViewNet to perform joint classification on MLO and CC
mammograms corresponding to the same breast. As indicated
by the work of Mohamed et al., the MLO view is easier to
classify than the CC view, and we conjecture that a network
which considers both views simultaneously will have more
accuracy on a breast-by-breast basis than a network which
considers only one view at a time [8]. Our architecture is
similar to that of Wu et al. except that our network accepts
only MLO and CC views of a single breast rather than
Fig. 2. Single image classifier (MobileNetV2) receiver operating charac-
MLO and CC views of both breasts from a single patient. teristic curves for the 4 different classes with both MLO and CC views
That is, our network predicts density on a breast-by-breast where 95% confidence intervals were determined by bootstrapping 1,000
rather than patient-by-patient basis [7]. DualViewNet passes image sets.

1141

Authorized licensed use limited to: University of Durham. Downloaded on September 27,2020 at 03:33:52 UTC from IEEE Xplore. Restrictions apply.
classification agreement between different radiologists can
vary between 62% and 87%, and that even software programs
can vary significantly [6]. Gandomkar et al. also mention
this issue with respect to potentially poorly labeled training
data [11]. Because of subjectivity concerns with respect to
breast density labelling, we believe that direct performance
comparisons between studies is unfair when these studies
have used different datasets [5].
Fig. 3. Receiver operating characteristic curves for DualViewNet where We have focused this study on the visualization and deeper
confidence intervals were obtained by bootstrapping 1,000 image sets. understanding of neural network attention during density
classification. Figure 4 provides a set of mammograms and
corresponding saliency maps for DualViewNet. Saliency
maps provide visualization of which pixels are most influen-
tial in classification of a particular image. As shown in figure
4, the network is correctly focused on pixels corresponding to
breast tissue. Figure 4 is also interesting because it visualizes
DualViewNet’s preference for MLO over CC images. In
figure 4, saliency maps for the MLO views on the lower
right are brighter than saliency maps for the CC views on the
Fig. 4. Saliency maps for CC views (left in each group) and MLO views lower left of each image group, indicating a strong influence
(right in each group) from DualViewNet show preference towards MLO of MLO pixels in classification. This preference for MLO
view in assessing breast density (compare relative pixel brightness).
mammograms is consistent with previous research indicating
superior classification on MLO rather than CC mammograms
[8]. Figure 5 depicts evidence of MLO over CC preference
interval 0.8114-0.9286, and AUC of 0.8836 with confidence
for hundreds of breasts. In figure 5, red indicates breasts
interval 0.8374-0.9196, respectively. DualViewNet achieved
where the MLO pixels as a whole were valued with higher
a slightly higher macro average AUC of 0.8970 with macro
importance than the CC image pixels, whereas blue indicates
average 95% confidence interval of 0.8239-0.9450. Each of
breasts where the CC pixels were valued more than the MLO
these confidence intervals were obtained by bootstrapping
pixels. The MLO view is favored over the CC view in 1,187
1,000 sets out of the test set. Although DualViewNet has
out of 1,323 breasts.
slightly stronger performance than MobileNetV2, this perfor-
mance difference is small with respect to the confidence in-
tervals. Also, all networks were trained, validated, and tested Algorithm 1: Gradient Summary Algorithm
with the same dataset, but the mixed view MobileNetV2 saw 1 features = []
twice as many samples as DualViewNet since it operated on 2 colors = []
3 for mlo image, cc image in mammogram pairs do
an image-by-image rather than breast-by-breast basis. A plot
4 mlo saliency map, cc saliency map = get saliency maps(mlo image, cc image)
depicting correctly and incorrectly classified mammograms mlo gradient sum =
P P
is shown in figure 7.
5
P rowP column mlo saliency map[row, column]
6 cc gradient sum = cc saliency map[row, column]
row column
7 if mlo gradient sum > cc gradient sum then
IV. D ISCUSSION
8 color = ’red’
Mohamed et al. [9] found an increase from 0.9421 to 9
mlo gradient sum
feature = cc gradient sum - 1
0.9882 area under curve after removing images with suspect 10 else

labels. Evidently, label accuracy could be a significant issue 11 color = ’blue’


cc gradient sum
12 feature = 1 - mlo gradient sum
for mammogram density data. Lehman et al. mention that
13 features.append(feature)
14 colors.append(color)

For another visualization of DualViewNet performance,


we utilize t-distributed stochastic neighbor embedding (t-
SNE) on both raw mammogram pixel values and network
extracted mammogram features. For computational speed,
principle component analysis reduced input values down
to 32 dimensions prior to applying t-SNE on either the
Fig. 5. Image gradients for MLO and CC views are compared to indicate pixel values or extracted features. t-SNE is a technique
network preference towards MLO images in classifying breast density. Red for projecting high dimensional features down to 2 or
indicates that the absolute gradient sum was higher for the MLO view
whereas blue indicates that the absolute gradient sum was higher for the 3 dimensions by maximizing the similarity between the
associated CC view. high and low dimensional feature distributions. Similarity

1142

Authorized licensed use limited to: University of Durham. Downloaded on September 27,2020 at 03:33:52 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES
[1] Timothy Cogan, Maribeth Cogan, and Lakshman Tamil. Mapgi:
Accurate identification of anatomical landmarks and diseased tissue
in gastrointestinal tract using deep learning. Computers in biology
and medicine, 111:103351, 2019.
[2] Timothy Cogan, Maribeth Cogan, and Lakshman Tamil. Rams:
Remote and automatic mammogram screening. Computers in biology
and medicine, 107:18–29, 2019.
[3] Valerie A McCormack and Isabel dos Santos Silva. Breast density and
Fig. 6. t-SNE provides a 2-dimensional understanding of class separation
parenchymal patterns as markers of breast cancer risk: a meta-analysis.
based on raw image pixels (left) and network output features (right).
Cancer Epidemiology and Prevention Biomarkers, 15(6):1159–1169,
2006.
[4] Norman F Boyd, Helen Guo, Lisa J Martin, Limei Sun, Jennifer Stone,
Eve Fishell, Roberta A Jong, Greg Hislop, Anna Chiarelli, Salomon
Minkin, et al. Mammographic density and the risk and detection of
breast cancer. New England Journal of Medicine, 356(3):227–236,
2007.
[5] Brandi T Nicholson, Alexander P LoRusso, Mark Smolkin, Viktor E
Bovbjerg, Gina R Petroni, and Jennifer A Harvey. Accuracy of as-
signed bi-rads breast density category definitions. Academic radiology,
13(9):1143–1149, 2006.
[6] Constance D Lehman, Adam Yala, Tal Schuster, Brian Dontchos,
Manisha Bahl, Kyle Swanson, and Regina Barzilay. Mammographic
breast density assessment using deep learning: clinical implementation.
Radiology, 290(1):52–58, 2018.
[7] Nan Wu, Krzysztof J Geras, Yiqiu Shen, Jingyi Su, S Gene Kim, Eric
Kim, Stacey Wolfson, Linda Moy, and Kyunghyun Cho. Breast density
classification with deep convolutional neural networks. In 2018 IEEE
International Conference on Acoustics, Speech and Signal Processing
Fig. 7. Mammograms are placed in bins per reference and predicted labels. (ICASSP), pages 6682–6686. IEEE, 2018.
[8] Aly A Mohamed, Yahong Luo, Hong Peng, Rachel C Jankowitz, and
Shandong Wu. Understanding clinical mammographic breast density
assessment: a deep learning perspective. Journal of digital imaging,
is defined by Kullback-Leibler divergences of conditional 31(4):387–392, 2018.
probabilites based on Euclidean distances in the high and low [9] Aly A Mohamed, Wendie A Berg, Hong Peng, Yahong Luo, Rachel C
Jankowitz, and Shandong Wu. A deep learning method for classifying
dimensional spaces [19]. Figure 6 depicts t-SNE performed mammographic breast density categories. Medical physics, 45(1):314–
with pixel values on the left and with extracted features on 321, 2018.
the right, and shows that the extracted features provide better [10] Juhun Lee and Robert M Nishikawa. Automated mammographic
breast density estimation using a fully convolutional network. Medical
representation of the underlying density classes than do the physics, 45(3):1178–1190, 2018.
raw pixel values. The two clusters seen in the left image are [11] Ziba Gandomkar, Moayyad E Suleiman, Delgermaa Demchig,
possibly representative of left versus right mammograms. Patrick C Brennan, and Mark F McEntee. Bi-rads density cate-
gorization using deep neural networks. In Medical Imaging 2019:
V. C ONCLUSION Image Perception, Observer Performance, and Technology Assessment,
volume 10952, page 109520N. International Society for Optics and
In a time where there is distrust in computer-based as- Photonics, 2019.
sessments, growing complexity in how algorithms operate, [12] R Sawyer Lee, Francisco Gimenez, Assaf Hoogi, and Daniel Rubin.
Curated breast imaging subset of ddsm. The cancer imaging archive,
and increasing responsibility placed on software, we believe 8, 2016.
that visualizing and thereby understanding the operations [13] Rebecca Sawyer Lee, Francisco Gimenez, Assaf Hoogi, Kanae Kawai
of algorithms is more important than ever. Our study has Miyake, Mia Gorovoy, and Daniel L Rubin. A curated mammography
data set for use in computer-aided detection and diagnosis research.
provided a novel approach for density classification, and our Scientific data, 4:170177, 2017.
analysis of DualViewNet has given a quantitative valuation [14] Kenneth Clark, Bruce Vendt, Kirk Smith, John Freymann, Justin Kirby,
of MLO verses CC preference on a per breast basis. In Paul Koppel, Stephen Moore, Stanley Phillips, David Maffitt, Michael
Pringle, et al. The cancer imaging archive (tcia): maintaining and
addition, DualViewNet provides an architecture by which operating a public information repository. Journal of digital imaging,
convolutional layers from other networks (e.g. Inception) 26(6):1045–1057, 2013.
could be easily used in place of MobileNetV2 layers. In such [15] Mpl colormaps. https://bids.github.io/colormap/, 2019. Accessed:
2019-08-28.
alternatives, t-SNE could be used as a qualitative assessment [16] Torchvision models. https://pytorch.org/docs/master/torchvision /mod-
of layer efficacy. We hope that the work we have presented els.html, 2019. Accessed: 2019-08-28.
will contribute towards the growing interest and acceptance [17] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov,
and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear
of AI assistance in medical image analysis. bottlenecks. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pages 4510–4520, 2018.
ACKNOWLEDGMENT [18] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward
Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga,
The DDSM used for training and testing was provided and Adam Lerer. Automatic differentiation in PyTorch. In NIPS
courtesy of the University of South Florida. The authors Autodiff Workshop, 2017.
also acknowledge the Texas Advanced Computing Center [19] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using
t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008.
(TACC) at the University of Texas at Austin for providing
HPC resources. URL: http://www.tacc.utexas.edu

1143

Authorized licensed use limited to: University of Durham. Downloaded on September 27,2020 at 03:33:52 UTC from IEEE Xplore. Restrictions apply.

You might also like