You are on page 1of 5

International Conference on Communication and Signal Processing, April 6-8, 2017, India

Fusion of Two View Binary Patterns to Improve


the Performance of Breast Cancer Diagnosis
S.Sasikala, Member, IEEE and M.Ezhilarasi


Abstract—Breast cancer remains the leading cause of cancer diagnosis and treatment. Though these techniques help
deaths among women globally. To reduce the breast cancer radiologist in the interpretation of both conventional and
mortality, early detection, diagnosis and treatment is an digital mammograms, there is no simple technique realized to
important requirement. Computer Aided Diagnosis (CADx)
accurately predict the occurrence of breast cancer. Breast
techniques with screening mammography is widely used for this
purpose. Further, improvements in CAD systems were achieved
cancer mortality could be reduced by improving the
by using both Medio Lateral Oblique (MLO) and Cranio Caudal performance of the CAD systems by incorporating new
(CC) view mammograms. In this study, fusion of Local Binary technologies. Fusion of MLO and CC view mammographic
Patterns of MLO and CC view images using Canonical features to improve the diagnostic performance of CAD was
Correlation Analysis (CCA) is proposed to improve the diagnostic addressed by many researchers.
accuracy and to reduce the false positive rate of the two view
systems. Two data bases, Digital Database for Screening This paper is organized as follows. In section II, various
Mammography (DDSM) and INbreast are used to evaluate the double view CAD systems developed earlier by researchers
performance of the proposed system. A significant improvement
are discussed. The section III explains the techniques involved
in the performance of the double view system is obtained with
CCA. The accuracy of 96.1% and 95.3% were obtained for in the overall system development. The section IV details the
DDSM and INbreast dataset respectively with serial fusion of performance metrics used, section V demonstrates the results
LBP features. The proposed system could help the radiologists in obtained in this work and conclusion is given in section VI.
diagnosis so that treatment can be started earlier during the
disease and therefore reduce the mortality. II. RELATED WORKS
A multi-view analysis based on Bayesian principle was
Index Terms— mammogram, LBP, feature level fusion, PCA,
CCA
proposed by Marina Velikova et al. [1]. They showed that the
multi-view analysis better discriminates normal and cancerous
I. INTRODUCTION lesions.
A matching between two-views based on a geometrical
Breast cancer results when cells in any part of breast begin model was performed by Paquerault et al. [2]. They extracted
to grow out of control due to their abnormal rates of the morphological and textural features to distinguish a lesion
proliferation and differentiation. It is found that the incidence pair detected from two views as a true or false abnormality
of breast cancer has been increased every year. In addition to pair using linear discriminant analysis.
its high incidence, it also exhibits a high mortality rate. In Fusion of Daubechies 3 wavelet transform features from CC
India, breast cancer accounts for 25% to 31% of all cancers in and MLO views was proposed by Rogerio Daniel Dantas et al.
women. Its delayed diagnosis results the treatment to become [3]. They have used the Random forest and SVM for analysis
mutilating and aggressive. Therefore, efforts should be made and obtained a maximum sensitivity, specificity and accuracy
to detect the breast cancer at the earlier stage. of 83.33%, 82.92% and 83.13% respectively.
Mammography has been used for breast cancer screening Two-view CADx systems using various fusion techniques
for last three decades in addition to the breast self-exam. was proposed by Lavanya et al. [4]. Area Under the Receiver
Lesions identified through screening to be evaluated and Operating Curve was computed to analyse the performance
diagnosed as benign or malignant by a radiologist. To reduce and proved that the performance of fusion method depends on
the variability among radiologists, CAD systems were the dataset used. Sasikala et al. extracted the Steerable
introduced. During last decade, CAD systems aided by breast pyramid features from both MLO and CC views, combined
imaging have made a lot of advancements in breast cancer them by concatenation [5] to improve the system performance.
The fused features were reduced by PCA and classified by a
support vector machine.
Here, an early diagnosis technique using fusion of binary
pattern texture features extracted from both MLO and CC view
S.Sasikala is with the Kumaraguru College Technology, India (phone:
9443525425; e-mail: sundarsasi@ gmail.com). mammograms is proposed. Hence the system could be used to
Dr.M.Ezhilarasi, is with the Kumaraguru College Technology, India (e- diagnose the breast cancer at the earlier stage and treatment
mail: ezhilarasimuthusamy@gmail.com). can also be started earlier. This will reduce the mortality.

978-1-5090-3800-8/17/$31.00 ©2017 IEEE

0792
III. SYSTEM DEVELOPMENT and 1 to data points on the class boundaries instead of forcing
them to fully belong to one of the classes, thus a data point can
A. Methodology belong to all clusters with different membership grades. The
The overall system architecture is illustrated in Fig.1. Patient degree of belonging is a function of the distance between the
data containing both MLO and CC view mammograms are data from the centroid, which includes a control parameter so
collected from data base, preprocessed to remove noise and that the highest weight given to the closest data point. FCM
other unnecessary artifacts such as labels and pectoral muscle. minimizes the objective function, which is a measure of
The preprocessed images are then segmented to obtain the similarity between any measured data and the centroid of a
tumor present in those images. After segmentation, the Local cluster. Fig. 2 shows the output images of preprocessing and
Binary Pattern features are extracted from the tumor parts segmentation stages applied on a mammogram of malignant
segmented from two views separately. case taken from DDSM database,
A base line system is designed in which features from MLO
or CC views are individually reduced by Principle Component D. Feature Extraction
Analysis (PCA) and classified by Support Vector Machine
(SVM). To improve the performance further a double view The original LBP operator introduced by Ojala et al [8] is
scheme is proposed in which the features from the two views used here. It forms labels for each pixel in an image by
are fused either in serial or parallel fashion after reducing them thresholding the 3 x 3 neighborhood of that pixel with the
by PCA or Canonical Correlation Analysis(CCA) followed by center value and considering the result as a binary number.
SVM classifier. The results of these systems are compared by The neighborhood positions of the centre pixel having larger
computing eight relevant performance metrics. values than the value of centre pixel value are coded as binary
1 and other neighborhood positions are coded as binary 0.
B. Databases Then, the eight neighbors of centre pixel are represented with
Two data bases are used to design and evaluate the proposed an 8-bit unsigned number. The final LBP code for any pixel is
systems. One is Digital Data Base for Screening obtained by simply performing element wise multiplication
Mammography (DDSM) [6] which contains digitized film between the binary code generated and the number
screen mammograms. The other data base is INbreast which representation of the neighbors and adding all the results. The
consist of Full Field Digital Mammography (FFDM) [7] entire process of LBP extraction from a 3x3 neighborhood is
images. illustrated in Fig.3. Similar way, the LBP codes for all the
pixels in the image are formed. Then, mean, standard
C. Preprocessing and Segmentation deviation, entropy, Root Mean Square (RMS) value, variance,
Preprocessing step involves denoising and pectoral muscle smoothness, kurtosis and skewness of the resultant matrix is
removal. The noise present in these images are removed by calculated as texture features.
median filtering. After removing the noise, the images are then
enhanced by Contrast Limited Adaptive Histogram E. Feature Level Fusion
Equalization (CLAHE). Labels, such as hospital name and Feature level fusion can be performed either in serial or in
equipment name in the mammogram do not contribute any parallel fashion. To obtain the new fused feature vector, the
information in the classification of masses, hence they are two feature vectors are simply concatenated in serial fusion.
removed based on area constraint of the label. The labels are Whereas in parallel fusion, they are made equal in dimension
very smaller in area compared to the breast area. Initially, the by zero padding and then summed up. Prior to fusion, the
individual components in the image are labeled after features are transformed to reduce the dimension so that
thresholding and then the labels are removed by removing prominent features can be obtained. In this work, Principal
region of smaller area. Component Analysis (PCA) and Canonical Correlation
Pectoral muscle appears as a high density triangular region Analysis (CCA) are used to reduce the dimensionality. In
like the dense tissues of interest at the upper posterior part of PCA, the feature vector is projected on the eigenvectors of its
MLO mammogram. As the presence of pectoral muscle could covariance matrix corresponding to the largest M eigen values
be interpreted as abnormal masses during processing, the which are less than a fixed threshold value [9]. PCA provides a
complexity of CAD systems will be increased and results in set of linear transformations corresponding to the given input
wrong interpretations in diagnosis and produces false data and preserves maximum possible randomness present in
positives. Hence it is necessary to remove the pectoral muscle the original data.
in mammograms before searching the abnormalities present. In CCA, the mutual information between two feature sets is
To remove pectoral muscles present in MLO images, generally examined. The cross correlation between the input
thresholding is applied on the denoised mammograms and feature sets x1 and x2 is used to get two new sets ẋ1=wx1Tx1
pectoral muscle is identified from the thresholded output. The and ẋ2 = wx2Tx2. The transformations are obtained in such a
identified pectoral muscle is manually cropped and removed way that ẋ and ẏ maximize their cross correlations and at the
out. same time minimize auto correlation [10]. Maximization is
The tumorous regions from both the views are segmented performed by using Lagrangian multipliers subject to the
using fuzzy c means clustering (FCM). In FCM each data constraints, variances of (ẋ1) and (ẋ2) set to unity. The eigen
point has a set of degree of belongingness relative to all value equations in (1) are solved to find the transformation
clusters. FCM assigns partial membership degrees between 0 matrices, wx1and wx2.

0793
MLO Image CC Image

Noise & Pectoral Noise Removal


Muscle Removal

Segmentation
Segmentation

LBP Feature LBP Feature


Extraction Extraction

Feature Feature Level Feature


Reduction Fusion Reduction

SVM SVM
SVM

MLO system MLO-CC System CC System


Fig.1. Proposed system architecture

Fig. 2. Preprocessing of a malignant image from DDSM dataset

68 79 32 1 1 0 LBP Code for pixel P


92 50 89 1 P 1 11010001 binary
42 18 Fig.
273. Extraction of Local Binary 209 Decimal
Patterns
0 0 0

where, ŵ x1 and ŵ x2 are the eigenvectors and š is the


2

squares of the canonical correlations or diagonal matrix of


S x11x1S x1 x 2 S x12 x 2 S x 2 x1ŵ x1 š 2 ŵ x1 eigen values. The transformation matrices wx1 and wx2
S x12 x 2 S x 2 x1S x11x1S x1 x 2 ŵ x2 š 2 ŵ x2 (1)

0794
consist of the sorted eigenvectors obtained using the non-zero The accuracy, sensitivity, specificity and F1 score are obtained
eigen values arranged in decreasing order. as 96.1%, 96.6%, 95.6% and 0.97 for DDSM and 95.3%,
F. Classification 91.5%, 100% and 0.96 for INbreast datasets respectively when
the LBP features are fused serially fused after applying CCA
After performing the feature level fusion, the new resultant
feature reduction.
feature set is classified by Support Vector Machine (SVM)
The LBP is more robust to monotonic gray-level changes
classifier [11]. SVM creates an optimal separating hyper plane
and computationally very simple. CCA transforms the feature
between the two classes by using the training data and
vectors in such a way that the transformed vectors have
maximizes the margins between two classes of the hyper plane.
maximum cross correlation and minimum autocorrelation and
SVM uses either linear or nonlinear kernels and are called as
produces a more discriminative feature set for better
linear SVM or nonlinear SVM respectively. Linear SVM
classification. Hence, the proposed system exhibits a
could be used for the classification of linearly separable data.
significant improvement in accuracy, sensitivity, specificity
A non-linearly separable data is transformed to a linearly
and F1score. The performance of the proposed system is
separable data in higher dimensions using non-linear kernels.
compared with the existing systems in terms of accuracy,
In this work, SVM with non-linear Radial Basis Function
sensitivity and specificity in Table. IV.
(RBF) kernel is used for differentiating a tumor as benign or
The overall performance of the proposed system is
malignant. From the classification result confusion matrix is
significantly better compared to the previous works. Table.5
derived and performance metrics are computed from that
shows that the proposed double view CADx system based on
confusion matrix.
serial fusion of statistical LBP features from MLO and CC
view mammograms of same patient using canonical correlation
IV. PERFORMANCE EVALUATION
analysis yields better results compared to the existing method.
Marina Sokolova et al. [12] reported that single metric
such as accuracy is not sufficient to determine the overall
system performance, instead, a combination of metrics gives TABLE I
an unbiased estimation. In this work, four performance PERFORMANCE METRICS
metrics given in Table. I are used to quantify the
performance of the proposed system [13]. Accuracy gives S.No Measure Symbol Formula
the percentage of correct classifications. To assess the TP  TN
effectiveness of the system on a single class, sensitivity and 1. Accuracy Acc FP  FN  TP  TN
specificity are used. F1-measure provides a measure of a
test's accuracy. True Positive TPR TP TP
2. Rate/ (Rec
Recall/Sensitivity /Sen) P FN  TP
V. RESULTS AND DISCUSSION True Negative TNR TN TN
3.
Rate/Specificity (Spec) N FP  TN
The statistical measures computed from LBP are used as
PRE ..REC
texture features to distinguish the benign and malignant 4. F1score F1 2
PRE  REC
tumors. These are computed for both MLO and CC views of
DDSM and INbreast data sets separately.
TABLE II
In the single view base line system, features from MLO or
CC view are individually reduced by (PCA) and classified PERFORMANCE OF SINGLE VIEW BASELINE SYSTEMS
(SVM). The performance metrics computed for these systems Dataset View Acc Sen SPC F1
using DDSM and INbreast datasets are tabulated in Table. II. DDSM CC 85.4 85.2 85.7 0.87
Then, a double view system is proposed in which the MLO MLO 88.3 88.3 88.4 0.90
and CC view features are fused together by serial or parallel
manner after PCA reduction to improve the diagnostic INbreast CC 84.9 85.7 84.1 0.85
performance. The results of the fused systems are tabulated in MLO 87.2 82.0 94.4 0.88
Table.III. A significant improvement in the performance of the
fused system is observed when compared with the base line TABLE. III
system.
For further improvement in the performance, the single view PERFORMANCE OF DOUBLE VIEW SYSTEMS WITH PCA
features are reduced by CCA instead of PCA before fusing Dataset Fusion Acc Sen SPC F1
them together. The results of these double view systems with DDSM Parallel 93.2 93.2 93.2 0.94
CCA reduction are given in Table.4. Serial 95.1 94.9 95.5 0.96
From Table. II and Table. III, it is observed that the
performance of the system is improved when the features from INbreast Parallel 90.7 90.7 90.7 0.91
the two views are fused serially after applying PCA reduction. Serial 93.0 95.1 91.1 0.93
The Table.IV shows that the performance metrics are
significantly improved, if CCA is used for feature reduction.

0795
TABLE. IV Universidade do Porto for providing the DDSM and INbreast
PERFORMANCE OF DOUBLE VIEW SYSTEMS WITH CCA datasets respectively.
TPR / TNR /
Dataset Fusion ACC F1
SEN SPC REFERENCES
Parallel 94.2 93.3 95.3 0.95 [1] Velikova, Marina, et al. "Improved mammographic CAD performance
DDSM
Serial 96.1 96.6 95.6 0.97 using multi-view information: a Bayesian network framework." Physics
in Medicine and Biology 54.5 (2009): 1131.
[2] Paquerault, Sophie, et al. "Improvement of computerized mass detection
Parallel 91.9 89.1 95 0.92 on mammograms: Fusion of two- view information." Medical
INbreast Physics 29.2 (2002): 238-247.
Serial 95.3 91.5 100 0.96
[3] Jacomini, Souza, Danilo César Pereira, and Rodrigo Pereira Ramos.
"Fusion of Two-View Information: SVD Based Modeling for
TABLE. V Computerized Classification of Breast Lesions on Mammograms."
PERFORMANCE COMPARISON WITH EXISTING WORKS [4] Lavanya, R., and Amrita Vishwa Vidyapeeetham. "Comparison of
Metrics Rogerio Sasikala Proposed Work Fusion Schemes for Two-View Analysis Of Breast Cancer Using
et. al. et. al. Mammograms."
DDSM INbreast [5] S.Sasikala, M.Ezhilarasi & A.Rasheedha, “Breast cancer diagnosis
using texture features from both MLO & CC view mammograms”.
Accuracy 83.13% 96.32% 96.1% 95.3% International Journal of Applied Engineering Research, 10 37 (2015):
Sensitivity 77.08% 95.45% 96.6% 91.5% 27934-27939.
[6] Heath, M., Bowyer, K., Kopans, D., Kegelmeyer Jr, P., Moore, R.,
Specificit 89.17% 95.45% 95.6% 100% Chang, K., & Munishkumaran, S. (1998). Current status of the digital
y database for screening mammography. In Digital mammography (pp.
457-460). Springer Netherlands.
VI. CONCLUSION [7] Moreira, I. C., Amaral, I., Domingues, I., Cardoso, A., Cardoso, M. J., &
Cardoso, J. S. (2012). INbreast: toward a full-field digital
In this work, an attempt is made to improve the diagnostic mammographic database. Academic radiology, 19(2), 236-248.
performance of the breast cancer diagnosis. The single view [8] Ojala, T., Pietikäinen, M., & Mäenpää, T. (2001, March). A generalized
local binary pattern operator for multiresolution gray scale and rotation
CAD using either MLO or CC view are evaluated first. Then
invariant texture classification. In International Conference on Advances
the performance of double view CAD system with parallel and in Pattern Recognition (pp. 399-408). Springer Berlin Heidelberg
Our future work will focus towards the implementation of [9] Jolliffe, I. (2002). Principal component analysis. John Wiley & Sons,
this system in hardware so that it can be used in hospitals and Ltd.
scan centers in real time. This will prevent the patients with [10] Haghighat, M., Abdel-Mottaleb, M., & Alhalabi, W. (2016). Fully
automatic face normalization and single sample face recognition in
breast abnormality to undergo unnecessary biopsies and other unconstrained environments. Expert Systems with Applications, 47, 23-
invasive procedures for the diagnosis of cancer. 34.
[11] Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992, July). A training
algorithm for optimal margin classifiers. In Proceedings of the fifth
annual workshop on Computational learning theory (pp. 144-152).
ACKNOWLEDGMENT ACM.
The authors would like to express their special thanks of [12] Sokolova, Marina, Nathalie Japkowicz, and Stan Szpakowicz. "Beyond
accuracy, F-score and ROC: a family of discriminant measures for
gratitude to Dr. Thomas Deserno (nee Lehmann), Department performance evaluation." Australasian Joint Conference on
of Medical Informatics, Aachen University of Technology, D- ArtificialIntelligence. Springer Berlin Heidelberg, 2006. Letter Symbols
52057 Aachen, GERMANY and Jaime S. Cardoso, Breast for Quantities, ANSI Standard Y10.5-1968.
[13] Raschka, Sebastian. "An Overview of General Performance Metrics of
Research Group, INESC Porto, Faculdade de Engenharia Binary Classifier Systems." arXiv preprint arXiv:1410.5330 (2014).

0796

You might also like