You are on page 1of 13

Breast Cancer Screening Using Convolutional Neural Network

and Follow-up Digital Mammography

Yufeng Zhenga, Clifford Yangb, Alex Merkulovb


a
Alcorn State University, Lorman, MS, USA
b
University of Connecticut Health Center, Farmington, CT, USA

Email: yzheng@alcorn.edu

ABSTRACT

We propose a computer-aided detection (CAD) method for breast cancer screening using convolutional neural network
(CNN) and follow-up scans. First, mammographic images are examined by three cascading object detectors to detect
suspicious cancerous regions. Then all regional images are fed to a trained CNN (based on the pre-trained VGG-19
model) to filter out false positives. Three cascading detectors are trained with Haar features, local binary pattern (LBP)
and histograms of oriented gradient (HOG) separately via an AdaBoost approach. The bounding boxes (BBs) from three
featured detectors are merged to generate a region proposal. Each regional image, consisting of three channels, current
scan (red channel), registered prior scan (green channel) and their difference (blue channel), is scaled to 224×224×3 for
CNN classification. We tested the proposed method using our digital mammographic database including 69 cancerous
subjects of mass, architecture distortion, and 27 healthy subjects, each of which includes two scans, current (cancerous
or healthy), prior scan (healthy 1 year before). On average 165 BBs are created by three cascading classifiers on each
mammogram, but only 3 BBs remained per image after the CNN classification. The overall performance is described as
follows: sensitivity = 0.928, specificity = 0.991, FNR = 0.072, and FPI (false positives per image) = 0.004. Considering
the early-stage cancerous status (1-year ago was normal), the performance of the proposed CAD method is very
promising.

Keywords: Computer-aided detection (CAD), Breast cancer screening, Convolutional neural network (CNN), Transfer
learning, Follow-up digital mammography, VGG-19.

1. INTRODUCTION

There are three types of breast lesions according to the ACR Bi-RADS ® lexicon: mass, calcification and architecture
distortion (AD). Calcifications are relatively easy to be detected. However, mass and AD are more challenging
especially at their early stages. Computer-aided detection (CAD) tools are considered a radiologist’s “second pair of
eyes”, which mark suspicious regions but leaves the final decision to the radiologists. CAD tools and digital
mammograms not only save time in diagnosing cancers and improving detection rates, but also bring hope for early
diagnoses and treatment of breast cancer. A typical CAD solution is comprised of two steps: cancer detection to locate a
lesion and cancer classification to confirm it. The 1st step is very challenging because the size, location, and features of
a lesion are so different among various cases. In case a detection fails (is missed), there is no way to find the lesion on
the 2nd step.
A mass is defined as a space-occupying lesion seen in at least two different projections.1 Masses are described by their
shape (Round, Oval, Lobulated, Irregular) and margin characteristics (Circumscribed, Microlobulated, Obscured, Ill-
Defined, Spiculated). On mammograms, masses appear denser than healthy tissues. However, the patterns of mass
lesions are difficult to be directly defined by intensities or gradients because of large variations among individuals. For
example, masses are quite difficult to be recognized in dense breasts. Therefore, many advanced features are used to
identify mass lesions in screening mammograms in the literature. In general, neighborhood or regional textural features
are generated by jointly considering differences of orientations and correlation of scales. Kegelmeyer et al.2 developed a
method to detect spiculated masses using a set of 5 features for each pixel. They used the standard deviation of a local
edge orientation histogram (i.e., analysis of local oriented
edges, ALOE) and a subset of Law’s texture features (of
four dimensions). To address the variant mass size
problem, Liu et al.3 proposed a multi-resolution algorithm
by using the discrete wavelet transform based on
Kegelmeyer et al.’s work. Matsubara et al. 4 presented an
adaptive thresholding technique for the detection of
masses. Qian et al.5 developed a multi-resolution and
multi-orientation wavelet transform for the detection of
masses and spiculation analysis. They observed that
traditional wavelet transforms cannot extract directional
information which is crucial for spiculation detection.
Zheng6,7 proposed a “Gabor Cancer Detection” (GCD)
algorithm that consists of three steps: preprocessing,
segmentation (generating alarm segments), and
classification (reducing false alarms). “Circular Gaussian
Filter” (CGF) was introduced for segmentation and Gabor
features were used for classification. The experimental
results tested on the DDSM database (University of South
Florida) showed the promise of GCD algorithm in mass
detection: TPR (true positive rate) = 90% at FPI (false
positives per image) = 1.21. Similar to other texture-based
methods, the GCD algorithm is quite complicated and
requires heavy computation time.
Mammographers compare current mammograms with
prior images to make decisions utilizing temporal
changes. Few researchers reported breast cancer
detections using temporal analysis plus Gabor features.
Zheng et al.8 used current and prior mammograms to
detect breast cancer masses without Gabor filtering. Tan9
utilized current and prior mammograms and Gabor filters
and achieved an AUC (area under ROC curve) of 0.725 ±
0.026. Rangayyan et al.10 used single time point data to
detect architectural distortions with Gabor filtering
achieving an AUC of 0.61.
Deep learning and CNN provides a new path to
mammographic screening. Soriano et al.11 applied random
and grid search algorithms for mammogram classification
based on CNN. 85.00% accuracy was reported in
classifying benign and malignant mammograms tested on
the Digital Database for Screening Mammography
(DDSM). Jadoon et al.12 extracted dense scale invariant
features (DSIFT) from discrete wavelet (DW) and
curvelet transform (CT) of mammograms, which are fed
to CNN for classification. CNN-DW and CNN-CT have Fig. 1: Diagram of the proposed CAD method for breast cancer
achieved accuracy rate of 81.83% and 83.74%, detection: Dif means the difference image between current scan
respectively, when testing on the DDSM and the and prior scan. CNN-classified marks are multiple bounding
Mammographic Images Analysis Society (MIAS) boxes highlighting cancerous areas, wherein no mark indicates
database. Jiang et al.13 applied pre-trained GoogLeNet and normal status.
AlexNet for classification of breast mass lesions, and achieved AUC = 0.88 and AUC = 0.83 when evaluating a new
mammographic dataset.
Early cancer detection with mammograms is very challenging due to the large variance of cancer (mass) patterns. There
are many factors that impact the cancer appearance on mammograms such as type/stage of cancer, size/density of breast,
individual differences, etc. The location and size of cancer lesions vary from case to case, which makes cancer detection
very difficult. Inspired by face detection and deep learning CNN, we propose a CAD method that uses multiple object
detectors, follow-up scans, and deep learning CNN for early-stage cancer detection. AD is a special case of masses,
which is covered by the term of mass detection. It is required that the training samples include both mass and AD cases.
The proposed research is an innovative solution for breast cancer detection that incorporates object detection, temporal
analysis and CNN into one CAD model.
The objective of this research is to find a CAD solution that automatically detects and locates cancers accurately and
quickly. The remainder of this paper is organized as follows: Section 2 provides an overview of the proposed CAD
method. Section 3 presents cascading object detection methods to create a regional proposal. Section 4 describes the
CNN model for breast cancer classification. Section 5 presents experiments, results and discussion. Section 6 concludes
the paper.

2. OVERVIEW OF THE PROPOSED CAD METHOD

A mass may not be visually perceived when it is small or homogeneous with surrounding tissues in its initial phase. The
current CAD methods are not sufficiently accurate in detecting early-stage masses. Possible reasons for limited
performance of existing CAD methods are lack of multiscale analysis and temporal analysis. Notice that
mammographers compare current mammograms with prior images to make decisions utilizing temporal changes. A
CAD model integrating both spatial and temporal features is anticipated to detect early-stage masses. Upon detection (a
rectangle showing the suspicious cancer area), texture features are usually extracted from the small rectangular area, and
then a classifier is trained to categorize a lesion into malicious or healthy (non-cancer) – a cancer classification stage.
CNN model is a good option for feature extraction and classification.
The key steps of the proposed CAD method (see Fig. 1) is described as follows: (1) Preprocess all mammographic
images; (2) Create a region proposal using three cascading object detectors; (3) Refine the region proposal by keeping
the regions voted by two or more detectors; (4) Create a 3-channel image with current scan (red), registered prior scan
(green), and their difference image (blue); (5) Remove false positive marks (BBs) by deploying an adapted VGG-19
network to 3-channel regional images; (6) Annotate the mammogram as cancerous if any positive marks present, or
healthy if all marks removed by CNN.

(a) (b)
Fig. 2 Original digital mammograms of Case# 37 (current exam with mass present) from
UCHC database: (a) Right CC view (3328×4096 pixels); (b) Right MLO view
(3328×4096 pixels). Note the large areas of dark background at the left side of the image.

2.1 Mammographic image preprocessing


Digital mammograms are originally stored in DICOM format. Each mammographic image has a large size (e.g.,
3328×4096 pixels) and 12 bits per pixel (see Fig. 2). The mammographic images are preprocessed by employing
normalization such as scaling image intensity to the range of [0, 1]. For fast processing, each original image is down-
sampled to ¼ of original size (i.e., reduced to half size in both row and column direction) and quantized to 8 bits per
pixel. The dark background areas are cropped off, which leaves the breast area (region of interest) for further processing
(refer to Fig. 3).
To create a 3-channel image (refer to Section 5.3 and Fig. 9c, looks like a false-colored image) using current scan, prior
scan and their difference image, the two mammograms (prior vs. current) must be aligned. Image registration technique
such as normalized mutual information (NMI) with affine transforms35,36 is applied for image alignment.

(a) (b) (c) (d)


Fig. 3 Preprocessed digital mammograms of Case# 37 from UCHC database: (a-b) Right CC (1051×1521 pixels) and MLO
(1069×1746 pixels) views of the current exam with mass present and marked by yellow rectangle; (c-d) Right CC and MLO 1-year
prior exam (not aligned yet) was normal.

3. CREATION OF REGION PROPOSAL

3.1 Creation of region proposal


Object detection is to find the locations of desired objects in a scene, for instance, face detection, vehicle detection, and
pedestrian detection. There are many methods developed for such applications. We review three commonly-used
features for breast cancer detection, Haar features, local binary pattern, and histograms of oriented gradient. These
features will be used to create a region proposal that includes possible cancerous areas.

(E) (F) (G) (H)


14,15
Fig. 4: This figure shows four rectangles (A-D) initially used by Viola and Jones to represent images for the learning
algorithm. Both the light and shaded regions are added up separately. The sum of the light regions is then subtracted from the
sum of the shaded regions. (E-H) are the extended Haar-like features.
3.1.1 Haar features and Viola-Jones algorithm
Face detection methods are well developed and quickly mark multiple faces on a picture regardless of their sizes and
backgrounds, utilizing spatial changes. The Viola-Jones (VJ) algorithm has become a very common method of object
detection, including face detection. Viola and Jones proposed this algorithm as a machine learning approach for object
detection with an emphasis on obtaining results rapidly and with high detection rates. The VJ method uses three
important aspects. The first is an image representation structure called integral images,14,15 wherein features are
calculated by taking the sum of pixels within multiple rectangular areas. This is of course an extension of a method by
Papageorgiou et al.16
The rectangles shown in Fig. 4 (A-D) are the four rectangles that
were initially used by Viola and Jones. Leinhart and Maydt later
extended the VJ algorithm to allow for rectangle rotations of up to 45
degrees (Fig. 4G-H).17 Using these methods, the sum of both the
white and the shaded regions of each rectangle are calculated
independently. The sum of the shaded region is then subtracted from
the white region. Viola and Jones admit that the use of rectangles is
rather primitive, but they allow for high computational efficiency in
calculations.14,15 This also lends well to the Adaboost algorithm that
is used to learn features in the image. This extension of the Adaboost
Fig. 5 Cascading object detector or classifier.
algorithm allows for the system to select a set of features and train
the classifiers, a method first discussed by Freund and Schapire.12 This learning algorithm allows the system to learn the
differences between the integral representations of faces to those of the background.
The last contribution of the VJ method is the use of cascading classifiers (refer to Fig. 5). At each stage, the classifier
either rejects the instance (represented as a sliding-window from a given test image) based on a given feature value or
sends the instance down the tree for more processing. Initially, a large number of negative examples are eliminated. In
order to significantly speed the system up, Viola and Jones avoid areas that are highly unlikely to contain the object. The
image is tested with the first cascade, and the areas that do not conform to the first cascade are rejected and no longer
processed. The areas that may contain the object are further processed until all of the classifiers are tested. The areas in
the last classifier are likely to contain the object (e.g., the face).
3.1.2 Local binary pattern
The local binary pattern (LBP) feature descriptor, as
one of the most popular feature extraction methods, is
based on various local coding operators originally
applied in texture description. Due to their relative
robustness under local lighting variation, the LBP
features are extensively applied in face recognition
and modified to adapt to real applications. Currently,
the studies of LBP feature focus on improving the
LBP feature on exploring extra details in face images
and reducing the dimension. Some works realize the Fig. 6 Illustration of the LBP encoding process.
18
improvement of LBP, such as Xie et al. who use the
Gabor wavelets transformation on the original texture space built Local XOR Patterns of Gabor Phase (LGXP).
The LBP texture analysis operator was first introduced as a complementary measure for local image contrast.19 The first
operator worked with the eight-neighbors of a pixel, using the value of the center pixel as a threshold. An LBP code for a
neighborhood was produced by multiplying the threshold values with weights given to the corresponding pixels, and
summing up the result. The LBP encoding process is illustrated in Fig. 6.
7
LBP ( xc , yc ) = ∑ S [ I ( x p , y p ) − I ( x p , y p )] × 2 p , (1)
p =0

where
1 if x ≥ 0 .
S ( x) =  (2)
 0 if x < 0
3.1.3 Histograms of oriented gradient (HOG)
The Histograms of Oriented Gradient (HOG) are the adaptation of Lowe’s Scale Invariant Feature Transformation
(SIFT)20 approach with local spatial histogramming and normalization. A HOG feature is created by first computing the
gradient magnitude and orientation at each image pixel in a region around an anchor point (keypoint). The region is split
into N×N subregions. Orientations are quantized by the number of bins in the histogram (typically four orientations). For
each histogram bin (orientation), we compute the sum of all magnitudes within the subregion having that particular
orientation. The histogram values are then normalized by
the total energy of all orientations to obtain values
between 0 and 1. Concatenating the histograms from all
the subregions gives the final HOG feature vector. Sobel
filters may be used to compute the gradient.
As illustrated in Fig. 7, the extraction of HOG features is
summarized as follows.
• Compute (Sobel) gradient magnitude and
orientation;
• Quantized to 4 orientations (#bins) for histogram;
Fig. 7 Illustration of the extraction of HOG features.
• Sum all magnitudes within a subregion having
particular orientation; and
• Concatenate the histograms from all the subregions to give the final HOG feature vector.
HOG features are widely used for vehicle detection together with a neural network (NN) classifier,21 an AdaBoost
classifier,22 and a support vector machine (SVM) classifier (HOG–SVM).23
3.1.4 Advantages of cascading detectors with simple features
(1) Simple features for fast detection: It is viable to detect the massive area using the Viola-Jones (VJ) algorithm. The
standard Haar-like features used in face detection are demonstrated in Fig. 4 (A-D). We apply both the standard features
and the extended Haar-like features (shown in Fig. 4E-H) for mass detection. The integral image is calculated with the
preprocessed mammogram. Once the integral image is ready, the Haar-like features only involve addition and
subtraction. Therefore, this feature extraction is very efficient and can be completed in a short time.
(2) Multiscale analysis to detect masses of various sizes: Once a detector (as shown in Fig. 5) is trained, detection is
done by sliding a window across an input image and passing the cropped sub-image through the classifier (i.e., detector).
In order for classification to be size-invariant, the same procedure is also performed on the input integral image at
various scales. Given this scheme, the output of classification is a series of sub-windows of the input image which
contain the detected cancers. Cancer detection can be implemented with the VJ method by varying the size of the sliding
window (rectangle). A real cancer may result in multiple nearby detections. It is necessary to combine overlapping
detections into a single detection (rectangle).
(3) Feature selection and classifier training using AdaBoost approach: At each stage (see Fig. 5), an AdaBoost-like
approach is applied to selecting one or more Haar-like simple features, as well as determining appropriate thresholds
which can be applied to reject a large number of negative training instances. Important input parameters for the training
procedure are the minimum true positive rate (TPR) and maximum false positive rate (FPR) – the search for optimal
feature and threshold selection will continue until those two requirements are met, at which point the remaining training
examples will be passed on to the next stage. The feature optimization is obtained by reweighting the features so that the
inputs where we made errors get higher weight in the learning process. For example, if those two parameters are set to
0.995 and 0.5 respectively, at each stage, feature selection and threshold optimization will be applied until the resulting
stage is capable of classifying 99.5% of the positive instances as positive and does not classify more than 50% of the
negative images as positive.
(4) Reduction of false positives with cascading classifiers: As shown in Fig. 5, each stage of the classifier can make
use of more than one feature in order to meet the requirements, in which case each stage can be viewed as a decision
tree. It is also important to note that at each stage, the classifier uses a different set of negative training images which are
sampled from a given database of images that do not contain the specified object (i.e., non-cancerous images). After
training the desired number of stages, the result is a cascade of tree-like classifiers (i.e., detectors). The structure of the
resulting classifier is essentially that of a degenerate decision tree. Each added stage to the classifier tends to reduce the
false positive rate, but also reduces the detection rate (true positive rate). As such, it is essential to train the classifier
with the appropriate number of stages for the given task. The training process is time-consuming but the detection
(testing) process (i.e., thresholding) is very fast.
3.2 Refining the region proposal with majority vote
For each mammogram, a region proposal is the union of (logically ORing) detected BBs from three detectors. As shown
in Fig. 9a, there are many detected regions (BBs). To reduce false positives, majority vote is applied to the regional
proposal. If a region is detected as positive by two or more featured detectors, then this region is kept, otherwise
removed. The refined regional proposal (see Fig. 9b) will be fed to CNN classification for false positive removal.

4. REGION CLASSIFICATION USING ADAPTED VGG-19 CNN

4.1 Convolutional neural network


Convolutional neural networks (CNNs) are sort of combination of biology, math and computer science, but these
networks have been the most influential innovations in the field of computer vision and artificial intelligence (AI). 2012
was the first year that CNN grew to prominence as Alex Krizhevsky24 used an 8-layer CNN (5 conv., 3 fully-connected)
to win that year’s ImageNet competition (referred to as AlexNet thereafter), dropping the classification error record from
25.8% (in 2011) to 16.4% (in 2012), an astounding improvement at the time. Since then many companies have been
using deep learning at the core of their services. For examples, Facebook uses neural nets for their automatic tagging
algorithms, Google for their photo search, Amazon for their product recommendations, Pinterest for their home feed
personalization, and Instagram for their search infrastructure.

CNNs take a biological inspiration from the visual cortex. The visual cortex has small regions of cells that are sensitive
to specific regions of the visual field. This idea was expanded upon by a fascinating experiment by Lettvin et al.25 in
1959 where they showed that some individual neuronal cells in the brain responded (or fired) only in the presence of
edges of a certain orientation. For example, some neurons fired when exposed to vertical edges and some when shown
horizontal or diagonal edges. Lettvin et al. found out that all of these neurons were organized in a columnar architecture
and that together, they were able to produce visual perception.26 This idea of specialized components inside of a system
having specific tasks (the neuronal cells in the visual cortex looking for specific characteristics) is one that machines use
as well, and is the basis behind CNNs.

A common misconception in the deep learning community is that without huge amount of data, it is not possible to
create effective deep learning models. While data is a critical part of creating the network, the idea of transfer learning
has helped to lessen the data demands. Transfer learning is the process of taking a pre-trained model (the weights and
parameters of a network that has been trained on a large dataset) and fine-tuning the model with your small dataset. The
idea is that the pre-trained model will act as a feature extractor. In general, the last layer of the network is replaced with
your classification layer (depending on how many classes in your problem). Then train (adapt) the network normally.

Let us say the pre-trained model was trained on ImageNet (ImageNet is a dataset that contains 14 million images with
over 1,000 classes).27 The lower layers of the network will detect features like edges and curves. Most likely your
network is going to need to detect curves and edges as well. Rather than training the whole network through a random
initialization of weights, it is more efficient and effective to use the weights of the pre-trained model and focus on the
more important layers (higher layers towards classification output) for training. If your dataset is quite different than
ImageNet, then you would want to train more of higher layers.

4.2 VGG-19 model


Simonyan and Zisserman of the University of Oxford created a 19-layer (16 conv., 3 fully-connected) CNN that strictly
used 3×3 filters with stride and pad of 1, along with 2×2 max-pooling layers with stride 2, called VGG-19 model.28,29
Compared to AlexNet, the VGG-19 (see Fig. 8) is a deeper CNN with more layers. To reduce the number of parameters
in such deep networks, it uses small 3×3 filters in all convolutional layers and best utilized with its 7.3% error rate. The
VGG-19 model was not the winner of ILSVRC30 2014, however, the VGG Net is one of the most influential papers
because it reinforced the notion that CNNs have to have a deep network of layers in order for this hierarchical
representation of visual data to work. Keep it deep. Keep it simple.

The VGG-19 model, a total of 138M parameters, was placed 2nd in classification and 1st in localization in ILSVRC
2014. This model is trained on a subset of the ImageNet27 database, which is used in the ImageNet Large-Scale Visual
Recognition Challenge (ILSVRC).30 The VGG-19 is trained on more than a million images and can classify images into
1000 object categories, for example, keyboard, mouse, pencil, and many animals. As a result, the model has learned rich
feature representations for a wide range of images.
Region-based CNN (R-CNN)31 generates a region proposal and uses CNN for object detection and classification. Fast
R-CNN32 and Fasted R-CNN33 are proposed later to speed up the process and improve the accuracy. However, the R-
CNN method is not suitable for breast cancer detection since the mammograms dramatically vary in texture and size of
lesion (cancer) from case to case.

Fig. 8 Illustration of the network architecture of VGG-19 model: conv means convolution, FC means fully connected

4.3 VGG-19 model adaption using 3-channel image made of both current and prior scans – capturing subtle
changes over time
Radiologists use current and prior mammograms (i.e., two examinations at different time periods) side by side to see
small changes over time, and then make a diagnostic decision. Our digital mammographic database contains
mammograms from current and prior exams (typically 1 year prior). In order to use both current and prior exams and
their difference image, the two mammograms must be aligned using image registration technique.
Since the VGG-19 model takes a color image as input, a 3-channel image (see Fig. 9c) is created by assigning (current
exam, prior exam, difference image) to (red, green, blue) channel, respectively. All regional images (from the refined
region proposal) are cropped from the 3-channel image and scaled to 224×224×3 for VGG-19 training and testing. In
this way, subtle changes over time are reflected in this 3-channel image and featured in the adapted VGG-19 model.

5. EXPERIMENTAL RESULTS AND DISCUSSION

5.1 Digital mammogram database


A retrospective study is conducted with 96 subjects (69 cancerous vs. 27 healthy) of originally digital mammograms (not
digitized from films) collected at University of Connecticut Health Center (UCHC), called UCHC DigiMammo
(UCHCDM) database.34 We are still in the progress of data collection to get more mammographic images. Each case
includes 4 mammograms (two views [CC and MLO] from two sides) imaged at two different times, referred as to
current and prior exam or scan (see Fig. 3). All mammographic images are deidentified, annotated in a descriptive text
file with known pathology (healthy, mass, AD, calcification) and circled at the locations of cancers (if any) on a separate
key image. These annotations are the ground truths for CAD model training and testing.
In 69 cancerous subjects, there are 28 labels (i.e., rectangles shown in Fig. 3) for ADs, 41 for masses annotated by our
expert radiologists. All mammographic images were preprocessed by normalization, downsampling, quantizing, and
cropping (refer to Section 2.1), and registration between current and prior scans. All analyzed cases have both current
and prior scans.
5.2 Creation of region proposal using three cascading detectors and refinement with majority vote
Three 20-stage cascading detectors were trained by using three classical features: Haar, LBP, and HOG,34 respectively.
50% false alarm rate and twice number of negative (healthy) samples were used during the training process. Note that
after 20 stages the false alarm rate should be theoretically reduced down to 0.520 (less than 1 of million). The negative
samples were obtained both from healthy images and from the non-cancerous areas in cancerous images. The detected
BBs (bounding boxes) by three cascading detectors are shown in Figs. 9-13a.
The number of BBs is reduced by simple majority vote, i.e., two or more detectors vote the same or similar region (BB).
For example, a BB is considered as true positive when both Haar- and LBP-based detectors mark the same location (i.e.,
logical AND operation). In our experiments, a threshold of 0.5, the overlapping ratio between two BBs, was applied to
refine the region proposal for CNN (as shown Figs. 9-13b). Of course, this process effectively reduces false positives.

(a) (b) (c) (d)


Fig. 9 CAD analyses with Case #37, right breast with mass present marked with a green rectangle, Up/Bottom: CC/MLO. (a)
Stage-1: Region proposal from 3 cascading detectors (138/188 cyan rectangles); (b) Stage-2: Refined region proposal voted by 2
or more detectors (63/90 pink rectangles); (c) Stage-3.1: 3-channel image formed by current scan (red), prior scan (green, 2 year
ago), and their difference (blue) – the enlarged detections shown at upper-left corner; (d) Stage-3.2: CNN-classified and
annotated (3/4 yellow rectangles – detected cancers).

5.3 CNN classification with 3-channel images


Two exams of either CC or MLO view must be aligned using image registration technique. Then a difference image is
obtained by subtracting the prior exam from the current exam and then scaled to the full-range intensity. A 3-channel
image is created by assigning (current scan, prior scan, difference image) to (red, green, blue) channel, which looks like
a false-colored image (see Fig. 9-13c). The regional images from the refined region proposal are cropped from the 3-
channel image and scaled to 224×224×3, which are used for CNN training based on a pre-trained VGG-19 model. The
VGG-19 training has gone through 380 epochs (loops), four times more negative samples (from healthy cases and non-
cancerous areas in cancerous cases) than positive samples were used.
The CAD performance is calculated by averaging the results of 10 runs, and each run is based on a random split of all
mammographic image samples, 75% for training vs. 25% for testing. The training of 3 cascading detectors (Haar, LBP,
and HOG) only uses the current exams, while the CNN training takes 3-channel regional images as inputs.

(a) (b) (c) (d)


Fig. 10 CAD analyses with Case #43, right breast with architecture distortion present marked with a green rectangle, CC view:
(a) Stage-1: Region proposal from 3 cascading detectors (160 cyan rectangles); (b) Stage-2: Refined region proposal voted by 2
or more detectors (51 pink rectangles); (c) Stage-3.1: 3-channel image formed by current scan (red), prior scan (green, 1 year
ago), and their difference (blue) – the enlarged detections shown at lower-left corner; (d) Stage-3.2: CNN-classified and
annotated (7 yellow rectangles – detected cancers).

(a) (b) (c) (d)


Fig. 11 CAD analyses with Case #63, left breast with architecture distortion present marked with a green rectangle, MLO view:
(a) Stage-1: Region proposal from 3 cascading detectors (213 cyan rectangles) – the enlarged detections shown at upper-right
corner; (b) Stage-2: Refined region proposal voted by 2 or more detectors (77 pink rectangles); (c) Stage-3.1: 3-channel image
formed by current scan (red), prior scan (green, 1 year ago), and their difference (blue); (d) Stage-3.2: CNN-classified and
annotated (0 yellow rectangles – false negative).

CNN-classification results based on the refined region proposal are the final CAD results. As shown in Fig. 9-13c (color
presentation for illustration) and Fig. 9-13d (grayscale presentation for diagnosis), if the number of detections
(rectangles) is 0 (none), that means the CAD result is normal; if the number is 1 or more, that means cancers are
detected as the rectangles (BBs) indicated.
Let us have case analyses as presented in Figs. 9-13. In case #37 (Fig. 9), there are 3 and 4 detected BBs in CC and MLO
view, and they all well overlapped with the ground-truth BB (true positives). There are 7 BBs detected in Case #43 (Fig.
10) and all are close to the ground truth. In Case #43 (Fig. 11), there is 2 detected BBs from the same detectors in Stage
1 overlapping with the ground-truth but missed at Stage 2 (i.e. filtered out as false negatives). In Case #116 (Fig. 12),
although 463 detections are present in Stage 1, all are eliminated in Stage 3 after CNN classification (true negative). In
Case #114 (Fig. 13), there is 1 false positive remained in Stage 3. Of course, multiple overlapped BBs may be merged
into one larger BB (but not yet merged in this paper).

(a) (b) (c) (d)


Fig. 12 CAD analyses with Case #116, right breast with healthy condition: CC view: (a) Stage-1: Region proposal from 3
cascading detectors (463 cyan rectangles); (b) Stage-2: Refined region proposal voted by 2 or more detectors (204 pink
rectangles); (c) Stage-3.1: 3-channel image formed by current scan (red), prior scan (green, 1 year ago), and their difference
(blue); (d) Stage-3.2: CNN-classified and no yellow rectangles (0 yellow rectangle, i.e., non-detected cancers thus normal).

(a) (b) (c) (d)


Fig. 13 CAD analyses with Case #114, left breast with healthy condition: CC view: (a) Stage-1: Region proposal from 3
cascading detectors (197 cyan rectangles); (b) Stage-2: Refined region proposal voted by 2 or more detectors (87 pink
rectangles); (c) Stage-3.1: 3-channel image formed by current scan (red), prior scan (green, 1 year ago), and their difference
(blue) – the enlarged detections shown at upper-right corner; (d) Stage-3.2: CNN-classified and 1 yellow rectangle (1 detected
cancer, i.e., 1 false positive).

The overall CAD performance is listed in Table 1, which shows Sensitivity = 0.928, Specificity = 0.991, FPI = 0.004.
Considering all cancerous cases were normal 1-2 years ago, these are early-stage cancers. The performance values show
high specificity, low false positives, and pretty high sensitivity, which demonstrated a very promising CAD method.
The time costs of the proposed CAD methods are given in Table 1, where the mean values were averaged across all
detections over 985 images from 96 subjects. All CAD algorithms were implemented and run in Matlab 2017a (Version
9.2), on a laptop computer, MSI GT73VR, with the following configuration: Intel i7-7820HK CPUs 2.9GHz, 16GB
RAM, 1.25TB hard disk, 64-bit Windows 10; NVIDIA GeForce GTX 1070 Graphics Board with 8GB video memory
(on board). VGG-19 training (of 380 epochs) takes a longer time, 8.5 hours (510 minutes), but testing is quite fast.
The number of false positives per image (0.004 FPI) from the proposed CAD is very small. CNN is really powerful to
eliminate false positives. Thus we need to increase the number of positive detections at Stage 1. We may relax the
restriction from Stage 2 (voted by 2 detectors), or reconfigure the cascading detectors, or add more detectors.
Table 1. The proposed CAD model performance for breast cancer detection.
FPI = # false positives per image, BB = bounding box (rectangle)

#Subjects in Database 69 Cancerous, 27 Healthy


Mean Performance
Sensitivity = 0.928, Specificity = 0.991, FPI = 0.004
(Average of 10 runs)
# Detected BBs 3-detectors: 165, Voted-by-2-detectors: 63,
per Image After-CNN-classification: 3
Final BBs vs.
OverlapRatio = 0.944, ConfidenceVal = 1.0
the ground-truth BB
3 cascading detectors: 22 minutes
Total Training Time
VGG-19 380 epochs: 510 minutes
3 cascading detections: 0.62 second per image
Mean Detection Time
VGG-19 Classification: 0.88 second per image

Two views (CC and MLO) are processed separately in this paper, but the CAD processing and detection should be
corresponded between two views. For example, two view detections may be used as cross reference and pinpointing the
cancer location in 3D space.
There are other ways to manipulate two exams (current vs. prior) such as the absolute difference, weighted difference
according to scan years, or unsharp masking. The question is how radiologists extract changes from two exams, which
may not be simple mathematical subtraction. Aligning two exams by image registration is challenging and time
consuming, and a better technique is anticipated.

6. CONCLUSIONS

We propose a CAD method for breast cancer detection that takes the advantages of cascading detectors, CNN, and
follow-up checks. The three cascading detectors can create a region proposal that contains possible cancerous areas,
while the adapted VGG-19 CNN is very effective in removing false positives. To detect subtle changes over time, 3-
channel images are formed with current scan, prior scan and difference image, and then fed to the CNN classifier. The
proposed CAD method is validated by detecting masses and ADs with digital mammograms (UCHC DigiMammo
Database), which shows very promising results, Sensitivity = 0.928, Specificity = 0.991, FPI = 0.004. This model can be
extended to the detections of calcifications.
The size of the database may limit statistical power; however, cascading detectors and neural network are able to
increase accuracy with a larger database. The UCHCDM database is still growing, and we will revisit our CAD method
in the near future and expect a better performance.

REFERENCES

[1] American College of Radiology, ACR BI-RADS — Mammography, Ultrasound & Magnetic Resonance Imaging (4th ed.),
American College of Radiology, Reston, VA (2003).
[2] Kegelmeyer, Jr., W.P., Pruneda, J.M., Bourland, P.D., et al., “Computer-aided mammographic screening for speculated lesions,”
Radiology, vol. 191, pp.331–337 (1994).
[3] Liu, S.L., Babbs, C.F., and Delp, E.J., “MultiResolution Detection of spiculated Lesions in Digital Mammograms,” IEEE
Transactions on Image Processing, vol. 10, pp.874-884 (2001).
[4] Matsubara, T., Fujita, H., Endo, T., Horita, K., et al., “Development of mass detection algorithm based on adaptive thresholding
technique in digital mammograms,” presented at Digital Mammogrpahy (1996).
[5] Qian, W., Li, L., Clarke, L., Clark, R.A., and Thomas, J., “Comparison of adaptive and non adaptive CAD methods for mass
detection,” Academic Radiology, vol. 6, pp.471-480 (1999).
[6] Zheng, Y. and Agyepong, K., “Mass Detection with Digitized Screening Mammograms by Using Gabor Features”, SPIE
Proceedings Vol. 6514, pp.651402-1-12, San Diego (2007).
[7] Zheng, Y., “Breast Cancer Detection with Gabor Features from Digital Mammograms,” Algorithms, Vol. 3, No. 1, 44-62, (2010).
[8] Zheng, B., et al., “Performance change of mammographic CAD schemes optimized with most-recent and prior image databases,”
Acad Radiol 10:283–288, (2003).
[9] Tan, m., et al., “Assessment of a Four-View Mammographic Image Feature Based Fusion Model to Predict Near-Term Breast
Cancer Risk,” Ann Biomed Eng. (2015).
[10] Rangayyan, R. et al., “Computer-aided detection of architectural distortion in prior mammograms of interval cancer,” J Digit
Imaging, 23(5):611-31 (2010).
[11] Soriano, D., Aguilar, C., Ramirez-Morales, I., Tusa, E., Rivas, W., Pinta, M., “Mammogram Classification Schemes by Using
Convolutional Neural Networks,” CITT 2017, Communications in Computer and Information Science, vol. 798, pp 71-85,
Springer, Cham (2017)
[12] Jadoon, M. Mohsin et al., “Three-Class Mammogram Classification Based on Descriptive CNN Features,” BioMed Research
International 2017: 3640901 (2017).
[13] Jiang, F., Liu, H., Yu, S., Xie, Y., “Breast mass lesion classification in mammograms by transfer learning,” ICBCB '17
Proceedings of the 5th International Conference on Bioinformatics and Computational Biology, Pages 59-62, Hong Kong, 2017.
[14] Viola, P. and Jones, M., “Rapid Object detection using a boosted cascade of simple features.” Proceedings of CVPR, vol. 1 pp.
511–518 (2001).
[15] Viola, P. and Jones, M., “Robust Real-time Object Detection,” International Journal of Computer Vision, Vol. 57, Iss. 2, pp.
137–154 (2001).
[16] Papageorgiou, C., Oren, M., and Poggio, T., “A general framework for object detection,” Sixth International Conference on
Computer Vision, pp. 555-562 (1998).
[17] Lienhart, R. and Maydt, J., “An extended set of Haar-like features for rapid object detection,” Proceedings of the International
Conference on Image Processing, vol. 1, pp. 900-903 (2002).
[18] Xie, S., Shan, S. Chen, X., and Chen, J., “Fusing Local Patterns of Gabor Magnitude and Phase for Face Recognition,” IEEE
Transactions on Image Processing,19(5),pp.1349~1361 (2010).
[19] Ojala, T., Pietikainen, M., and Harwood, D., “A comparative study of texture measures with classification based on feature
distributions,” Pattern recognition, pp. 51-59, (1996).
[20] Lowe, D., “Distinctive image features from scale-invariant keypoints”, International Journal of Computer Vision, 60(2):91-110,
( 2004).
[21] Gepperth, A., Edelbrunner, J., and Bocher, T., “Real-time detection and classification of cars in video sequences,” Intelligent
Vehicles Symposium, pages 625–631 (2005).
[22] Negri, P., Clady, X., Prevost, L., “Benchmarking haar and histograms of oriented gradients features applied to vehicle detection,”
Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics, Robotics and
Automation 1 (ICINCO 2007), Angers, France (2007).
[23] Han, F., Shan, Y., Cekander, R., Sawhney, H. and Kumar, R., “A Two-Stage Approach to People and Vehicle Detection with
HOG-Based SVM,” Proc. Performance Metrics for Intelligent Systems, pp. 133-140 (2006).
[24] Krizhevsky, A., Sutskever, I., Hinton, G.E., “Imagenet classification with deep convolutional neural networks”, Proceedings of
the 25th International Conference on Neural Information Processing Systems - Volume 1, Pages 1097-1105, Lake Tahoe, Nevada,
(2012).
[25] Lettvin, Maturana, McCulloch and Pitts, “What the Frog’s Eye Tells The Frog’s Brain,” Proc. Inst. Radio Engr. 1959, vol. 47
pages 1940-1951, (1959).
[26] Hubel and Wiesel & the Neural Basis of Visual Perception, https://knowingneurons.com/2014/10/29/hubel-and-wiesel-the-
neural-basis-of-visual-perception/ (2014).
[27] ImageNet, http://www.image-net.org.
[28] Simonyan, K., Zisserman, A., “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv technical report,
(2014).
[29] University of Oxford, Visual Geometry Group, http://www.robots.ox.ac.uk/~vgg/research/very_deep/
[30] Russakovsky, O., Deng, J., Su, H., et al., “ImageNet Large Scale Visual Recognition Challenge.” International Journal of
Computer Vision (IJCV). Vol. 115, Issue 3, pp. 211–252 (2015).
[31] Girshick, R., Donahue, J., Darrell, T., and Malik, J., “Rich feature hierarchies for accurate object detection and semantic
segmentation,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587 (2014).
[32] Girshick, R., “Fast R-CNN,” In Proceedings of the IEEE International Conference on Computer Vision, pages 1440–1448 (2015).
[33] Ren, S., He, K., Girshick, R. and Sun, J., “Faster R-CNN: Towards real-time object detection with region proposal networks,” In
Advances in neural information processing systems, pages 91–99 (2015).
[34] Zheng, Y., YangM C., Merkulov, A., Bandari, M., “Early breast cancer detection with digital mammograms using Haar-like
features and AdaBoost algorithm,” SPIE Proceedings Vol. 9871, Sensing and Analysis Technologies for Biomedical and
Cognitive Applications 2016, 98710D (2016).
[35] Engeland, S.V., Snoeren, P., Hendriks, J., and Karssemeijer, N., “A comparison of methods for mammogram registration,” IEEE
Trans. Medical Imag., Vol. 22, No. 11, pp.1436-1444 (2003).
[36] Pluim, J.P., Maintz, J.B.A. and Viergever, M.A., “Mutual information-based registration of medical images: a survey,” IEEE
Trans. Medical Imag., Vol.. 986-1004, (2003).

You might also like