You are on page 1of 7

IET Computer Vision

Special Issue: Computer Vision in Cancer Data Analysis

Mammographic mass classification using ISSN 1751-9632


Received on 21st December 2017
Revised 25th April 2018
filter response patches Accepted on 29th June 2018
E-First on 13th August 2018
doi: 10.1049/iet-cvi.2018.5244
www.ietdl.org

Zobia Suhail1 , Azam Hamidinekoo2, Reyer Zwiggelaar2


1Punjab University College of Information Technology, University of the Punjab, Lahore, Pakistan
2Department of Computer Science, Aberystwyth University, Wales SY23 3DB, UK
E-mail: zobia.suhail@pucit.edu.pk

Abstract: Considering the importance of early diagnosis of breast cancer, a supervised patch-wise texton-based approach has
been developed for the classification of mass abnormalities in mammograms. The proposed method is based on texture-based
classification of masses in mammograms and does not require segmentation of the mass region. In this approach, patches from
filter bank responses are utilised for generating the texton dictionary. The methodology is evaluated on the publicly available
Digital Database for Screening Mammography database. Using a naive Bayes classifier, a classification accuracy of 83% with
an area under the receiver operating characteristic curve of 0.89 was obtained. Experimental results demonstrated that the
patch-wise texton-based approach in conjunction with the naive Bayes classifier constructs an efficient and alternative approach
for automatic mammographic mass classification.

1 Introduction developed equipped with the classification of several breast


abnormalities.
Breast cancer is the most frequently diagnosed cancer worldwide,
and accounts for 25.2% of the total cancer-related deaths among
women [1]. Mammography, as the primary imaging modality for 2 Related work
diagnosing breast cancer, is used to identify abnormalities at an A ‘mass’ is a space-occupying lesion that can be projected in
early stage when treatment can be most effective. Architectural different views (e.g. Cranial-Caudal and MedioLateral-Oblique). A
distortion, (micro)calcifications and masses are considered as the potential mass seen in only a single projection is called a ‘density’
most commonly found abnormalities on mammographic scans until its three dimensionality is confirmed. According to the breast
(called mammograms) [2]. The size, morphology and density of imaging reporting and data systems [6], the information about mass
these abnormalities can be used in diagnosing their potentially shape, margin and density can be considered to define it as benign
cancerous (malignant) or non-cancerous (benign) nature [2], or malignant. In the CAD literature, common features which are
although currently all abnormalities are assessed with histology. used for mass classification include intensity, size, shape, texture
However, there is a significant variation in these morphological and location associated features [7–9].
features which makes manual evaluation difficult. Therefore, Several textural features have been studied in the past for the
building systems which can effectively provide automatic purpose of classifying mammographic masses as either benign or
detection, segmentation and classification of such lesions based on malignant [7, 8, 10–12]. Mudigonda et al. [7] extracted grey-level
machine learning methods has become one of the challenging areas co-occurence matrix (GLCM)-based texture features from the mass
in mammographic computer aided diagnosis (CAD) systems. region and ribbon pixels surrounding the mass region in order to
Separate or combinational CAD systems have been developed for classify the mass as benign or malignant. They extracted the
analysing these types of abnormalities to provide radiologists with boundary of the mass area using polygon modelling. They
a second opinion. Classification of benign and malignant masses in evaluated the performance of their proposed method on 54 images
mammograms is a complex task due to the low contrast between of which 28 were benign and 26 were malignant and reported the
the mass and the surrounding breast tissues [3, 4]. In this study, we best classification accuracy (CA) to be 82.1% with an area under
address the problem of mass classification in mammograms [5] and the receiver operating characteristic (ROC) curve equal to 0.85.
propose a new method for the classification of benign and Campos et al. [10] used independent component analysis (ICA)
malignant mammographic masses using a modified version of a to extract features for classifying mammograms as either normal,
texton-based approach using patches extracted from filter response benign or malignant. They used the Mammographic Image
images. The novelty of the work is to exploit the neighbouring Analysis Society (MIAS) database [total 200 regions of interest
structures from multiple filter response images of mammographic (RoIs)] for evaluation of classification performance with neural
masses in order to extract a more detailed representation of the networks (NNs) (multilayer perceptron NNs, probabilistic NNs and
mass. As the small patches are used for the overall model radial basis function NNs). They reported that the best results were
development and evaluation, the particular mass area is more achieved by using probabilistic NNs leading to an accuracy of
emphasised by avoiding the unnecessary details from the area 97.3%.
containing breast tissues. The idea can be applied to the Valarmathie et al. [11] used shape, margin and textural features
classification of other breast abnormalities (e.g. micro- of pre-segmented masses including average intensity, average
calcifications), as the concept of using small patches from the filter contrast, smoothness of the intensity in the region, uniformity of
response images extracts multiple localised aspects of the intensity and skewness of the histogram. They used images from
abnormalities. From each filter response image the small patches the mini-MIAS database from which 200 images were used for
are representing the distribution of the data separately for breast training and 132 for testing. Their best reported CA was equal to
tissue and the abnormalities. Hence, by using the proposed patch- 94% using a combination of margin and shape features. Similarly,
based texton approach, a generalised CAD system can be Rangayyan et al. [13] proposed a region-based measure of image
edge profile acutance along with other shape features like

IET Comput. Vis., 2018, Vol. 12 Iss. 8, pp. 1060-1066 1060


© The Institution of Engineering and Technology 2018
compactness, Fourier descriptors, moments and chord length in tumour cases and Az = 0.78 was reported for benign versus
order to classify circumscribed and spiculated tumours. They used malignant tumours.
54 images (39 from MIAS and 15 local cases) for experiments. Texton-based approaches have shown promising results for
They reported the importance of adding edge-based features with generic texture classification using single images [19], in which the
shape information by using their proposed acutance that alone textures are represented as histograms of textons in a texton
achieved a CA of 95%, whereas a similar CA has been achieved by dictionary. Varma and Zisserman [19] incorporated filter responses
combining acutance with moment-based shape measure and to build a texton dictionary. To improve the method, they used a
Fourier descriptors. Boujelben et al. [14] used the boundary joint distribution of intensity values for patches of a particular size
information of the mass region to characterise them as benign or [20] and concluded that raw image patches could be used to
malignant. They used features including radial distance measure, classify different textures. They reported improved CA of 97.17%
convexity and angular features to describe the boundary of the when a joint distribution of intensities was used compared to the
mass region. The Digital Database for Screening Mammography filter responses where the CA was 96.37% using the same
(DDSM) database was used, where 200 images (100 benign and experimental setup for both approaches.
100 malignant) were used for performance evaluation. The The work done in the past on the classification of benign and
classification was performed using K-nearest neighbour (KNN) malignant masses using textons is limited compared to other
and multilayer perceptron (MLP) classifiers and the best results texture classification techniques [11, 21]. Unlike the texton-based
were reported using an MLP classifier where they achieved a work proposed by Li et al. [22], who used sub-sampling before
specificity equal to 97.9% and sensitivity equal to 94.2%. applying a texton-based approach (with the CA equal to 85.96%),
Mu et al. [8] used 22 shape and textural features (5 shapes, 3 we have used a modified version of the texton-based approach on
edge-sharpness measures and 14 textures). Subsequently, a genetic the original dataset without applying any sub-sampling of the data,
algorithm (GA) was used for feature selection and improved but we have subdivided the filter responses into small patches and
classification results (Az = 0.95) were reported by using the used those patches to build the texton dictionary. The proposed
combination of shape, edge sharpness and texture features. Rouhi work used the patches from the filter response images rather than
et al. [12] extracted shape, intensity and texture features after the raw data patches [20] from a single image to build the model
segmenting the mass regions. Performing feature selection using for classification. The first reason for developing this approach is
GA, classification results were reported on different classifiers that the filter responses will enhance the characteristic features of
including random forest, naive Bayes, support vector machine masses, e.g. edges and linear aspects. A second reason for using
(SVM) and KNN and using the MIAS and DDSM databases they filter response images rather than raw image patches is the clear
reported their best CA as 96.47%. Ertas et al. [15] computed difference of CA achieved by using patches from the filter response
geometric parameters (area, perimeter, circularity, normalised images rather than the raw image data (covered in Section 5.1). So
circularity, Fourier descriptors etc.) for the basic shape of the the novelty of this method is the aggregation of several pixel
segmented mass area and reported that normalised circulatory area responses for each filter response image which is shown to be more
and Fourier coefficients can be used more effectively in order to useful in getting appropriate descriptors for the classification of
classify benign and malignant masses. They reported a CA of 92% benign and malignant mammographic masses.
(23 out of 25 biopsy-proven mammographic masses were correctly
classified) by analysing shape features. Dong et al. [16] used a 3 Methodology
feature set comprising of shape, margin, texture and intensity
features (32-dimensional feature set) for the classification of The proposed method is based on a modified version of the work
benign and malignant mass in mammograms after segmenting the done by Varma and Zisserman [19] and we used patches from filter
mass region. They evaluated the performance of their proposed response images for model generation, unlike the original work
feature set with several classifiers (random forest, SVM, decision where the authors used the filter response for each pixel for the
tree etc.) using different combinations of features and reported the purpose of model generation and evaluation. The overall process
best CA equal to 97.73% using the combination of all features with consists of four main steps, i.e. dataset preparation, generating
an optimised version of SVM. Ball and Bruce [9] evaluated the filter responses, model generation and model evaluation. The
validity of their proposed level set segmentation technique by details of each step are described in subsequent sub-sections. An
classifying the mass region into benign and malignant. They overview of the full methodology is shown in Fig. 1.
extracted two distinct feature sets, where the first feature set
consisted of patient age, morphological features, normalised radial 3.1 Data preparation
length based features and the features extracted from GLCM. The
second feature set contained features extracted from the segmented In our experiment, the DDSM [23] was used which comprises
boundary and the area outside the boundary region. After selecting ∼2500 cases and also provides associated patient information (age,
optimal features using stepwise linear discriminant analysis, they subtlety rating for abnormalities etc.), scanner and image
performed the classification using a KNN classifier and reported an information (spatial resolution) in addition to pixel-level ‘ground
overall CA equal to 87%. truth’ for the regions containing the abnormality. The dataset has
On the other hand, instead of selecting features from the been used by the research community to evaluate the performance
segmented mass area, it is possible to extract features from RoIs in of developed methods for segmentation/classification of a range of
order to classify the mass abnormality [17, 18]. In [17], a CAD mammographic abnormalities. A subset of 300 mass lesions (150
system for three different types of mammographic patches, i.e. benign; 150 malignant) was randomly selected from the DDSM
normal, benign and malignant was proposed. The local database. Using the annotated RoI provided as part of the dataset, a
configuration pattern (LCP) algorithm was used for feature bounding box for each mass was extracted and resized into 225 × 
extraction which was then concatenated with the statistical and 225 pixels.
frequency domain features. They reported the results of classifying Data is needed for texton model generation and subsequently
new ensemble features using four different classifiers, where the for mass classification. To avoid bias we have split the 300 cases
best accuracy achieved was 94.67% using the new feature into a separate texton model generation and evaluation dataset.
ensemble. Another three-class classification method was proposed From the selected dataset, 200 cases were used for the texton
by Buciu and Gacsadi [18], in which the directional features were model generation and 100 cases were used for evaluation of the
extracted by filtering image patches using Gabor wavelets. After proposed method which is distinct from the data used for texton
reducing the data dimensionality (for both filtered and original model generation.
mammographic patches) using principal component analysis We have used the generated texton models for the remaining
(PCA), they reported better classification results when Gabor 100 cases to obtain the features to be used for mass classification.
features were used instead of using original mammographic The evaluation dataset was partitioned into training and testing
images. They reported best results as Az = 0.79 for normal versus [using ten-fold cross-validation (10-FCV), see Section 4] in such a
way that no abnormality from a patient was included in both the

IET Comput. Vis., 2018, Vol. 12 Iss. 8, pp. 1060-1066 1061


© The Institution of Engineering and Technology 2018
Fig. 1  Generating the texton dictionary from patches of filter responses. As can be seen, all filter response images are sub-divided into non-overlapping
patches of size 7 × 7 before texton dictionary generation

with values for the (σ, τ) pair set as (2,1), (4,1), (4,2), (6,1), (6,2),
(6,3), (8,1), (8,2), (8,3), (10,1), (10,2), (10,3) and (10,4). These
filters are: rotationally invariant, isotropic and anisotropic, multi-
scale and at multiple orientations. With such characteristics, they
can generate appropriate features for various types of textures. The
RoIs were then convolved with each of the filter (see Section 3.1).
The resulting filter responses on one of the sample patches from
the input dataset are shown in Fig. 3. The filter responses were
normalised to zero mean and unit variance. Experiments showed
that this pre-processing improved the overall classification results.

3.3 Model generation


After getting filter response images, each RoI is sub-divided into 7 
× 7 patches. The number of patches extracted from each filter
response image is 625 (size of filter response image 175 × 175,
ends up with 625 patches, each of size 7 × 7). As already explained
in Section 3.1, texton model generation data comprised of 200
images (100 benign and 100 malignant) and there are in total 53
filter responses for each RoI. So, we got 3,312,500 patches for each
of the benign and malignant classes. The patches were extracted in
a non-overlapping fashion; additional patches could be extracted,
but this was not done at this stage due to computer memory
restrictions. All the extracted patches are converted to 1D vectors
Fig. 2  Process of generating the dataset. The red boundaries in the of length 49. At this point, texton model generation data for a
mammogram on the left side are showing annotations provided with the particular class c is defined as
dataset, whereas the blue squares around the red boundary are showing the
bounding box around the abnormality. The bounding box is extracted and Dc ∈ (benign | malignant) = [X1, X2, X3, …Xm] (2)
resized to 225 × 225 pixels
where m = 3,312,500 and Xi is defined as
training and testing data (in the DDSM dataset some cases contain
more than one abnormality). The details of producing the dataset Xi = [y1, y2, y3, …yn] (3)
can be seen in Fig. 2.
with n = 49, and Xi is a 1D vector representation of a 7×7 patch.
3.2 Generating filter responses
After data generation, a texton dictionary was build using K-
A combination of 53 filters (of size 49 × 49) from the maximum means clustering as in [19], estimating k textons by solving the
response (MR) and Schmid (S) filter banks [19] were selected following optimisation problem for a particular class c:
which are shown in Fig. 3. The first 40 filters are obtained from the
MR filter bank and are a mixture of edge, bar, Gaussian and k

Laplacian of Gaussian (LoG) filters. The edge and bar filters are argmin ∑ ∑ ∥ Xk − T j ∥2 (4)
T j = 1 Xk ∈ Dc
obtained at six orientations and three scales [(σx; σy) = (1,3), (2, 6),
(4, 12)], for the Gaussian filter with σ = 10 and for the three LoG
After applying K-means clustering for each of the benign and
filters with σ = 3, 10, 12. The other 13 filters are obtained from the
malignant classes, texton model generation data for both classes
S filter bank, including various isotropic Gabor-like filters [24] of
were partitioned into k clusters and the texton Tj is referred to by
the form
the mean vector of a particular cluster j ( j ≤ k). For the current
πτr −(r2 /2σ) work, we used 10 clusters for each of the benign and malignant
F(r, σ, τ) = F0(σ, τ) + cos e (1) classes (10 textons/class). The final texton dictionary was
σ
generated by merging the textons from both classes. This
1062 IET Comput. Vis., 2018, Vol. 12 Iss. 8, pp. 1060-1066
© The Institution of Engineering and Technology 2018
Fig. 3  Process of generating filter responses for a sample image. Fifty-three filters (from the MR and S filter banks) are shown on the left side of the image.
The middle image shows a sample image that has been convolved with the filters and the results are shown on the right

Fig. 4  Process of generating histograms from a test image. Each red box within one particular filter response corresponds to a 7 × 7 patch that has been
assigned to the most similar texton from the texton dictionary (Fig. 1). Finally, a frequency histogram is generated for one filter response image that represents
the frequency distribution of all the textons contained in the texton dictionary for a particular filter response. Subsequently, the histograms for all filter
responses were concatenated into a single feature vector

specification leads to 20 textons (T1, T2, T3, … T20) from which 10 dictionary (generated in Section 3.3). The Euclidean distance was
textons belong to the benign class (T1 … T10) and 10 to the used as the metric to estimate the similarity. After that frequency
malignant class (T11 … T20). Based on our experiments, 10 clusters histograms were generated for each filter response, where each bin
is an appropriate cluster representation for 200 samples per class of the histogram represented the frequency of a particular texton in
considering memory and computational expense (see Section 4). a particular filter response image. Fig. 4 shows the process of
In the work proposed by Varma and Zisserman [19, 20], the generating a frequency histogram from one of the filter response
final model consisted of the texton frequency histogram where the images from the evaluation data. Such histograms were generated
histograms were representing the probability distribution of each for each of the 53 filter response images for each evaluation RoI,
texton for each texton model generation sample (therefore, the total resulting in 53 histograms for a single evaluation image. The final
number of models was equal to the total number of texton model feature vector was represented by aggregating the texton
generation samples that were used for model generation), whereas distribution for all 53 filter responses for a test RoI, resulting in a
in the proposed approach the final model comprised the textons feature vector with dimension equal to 1060 (i.e. 20 × 53). The
(cluster centroids), which will be used to generate the features and most salient features were extracted (to reduce the data
perform evaluation. dimensionality) using Weka [25], where the CfsSubsetEval
attribute evaluator was used along with the BestFirst search
method. CfsSubsetEval evaluates the level of a subset of attributes
3.4 Model evaluation by evaluating the individual predictive ability of each feature along
For the evaluation, a feature set was generated for the data based with the degree of redundancy between them. Whereas, the
on the model developed in Section 3.3. For feature-set generation, BestFirst search method searches the subset of the total attribute
the same initial steps were repeated with the evaluation data: i.e. space by using greedy hill climbing with added backtracking
after getting filter responses for each RoI from the evaluation facility. In this way, the total attributes have been reduced from
dataset, the filter response images were divided into 7 × 7 patches 1060 to 17. Varma and Zisserman [19, 20] used the KNN approach
and subsequently converted into a 1D vector. The next step was to for the final classification, where they used Chi-square statistics to
assign each data vector to the closest texton from the texton compare the histogram corresponding to the evaluation data with

IET Comput. Vis., 2018, Vol. 12 Iss. 8, pp. 1060-1066 1063


© The Institution of Engineering and Technology 2018
Table 1 Confusion matrix by using filter response patches
Benign Malignant
benign 41 9
malignant 8 42

Fig. 5  ROC curve for the proposed method plotting the false positive rate
against the true positive rate. The operating points on the curve where the
CA was 83% have been indicated
Fig. 6  Randomly selected correctly classified instances
(a), (b) Instances from benign class, (c), (d) Malignant RoIs
the histograms that have been generated as a model, whereas here
we used a naive Bayes classifier using bootstrap aggregation to
report the final evaluation results.

4 Results and discussion


We have evaluated the developed approach on a subset of the
DDSM data, where the evaluation data consisted of 100 images (50
benign and 50 malignant). Evaluation data was prepared in a way
that only one mass would be included as a test/training instance for
one case (as mentioned in Section 3.1) and a 10-FCV scheme was
employed. The obtained CA was 83% (±8.23), with Az equal to
0.89 which is comparable to state-of-the-art results (see Section 5
for comparison). The confusion matrix for all 100 evaluation cases
is shown in Table 1. The results in Table 1 are showing the results
for all the 10 folds, therefore are representing the results for all 100 Fig. 7  Classification accuracy (CA) and area under the RoC (Az)
samples used for model evaluation. evaluation for different values for the number of clusters per class
The ROC curve for the proposed method can be found in Fig. 5.
Some correctly classified instances can be seen in Fig. 6. that is not the part of the breast area. The same aspect can be seen
Evaluation results of the texton-based approach were influenced in Fig. 8d, where a malignant RoI was classified as benign. From
by the number of clusters (or textons) defined for the K-means Fig. 8b, it can be seen that the image contains the pectoral muscle
clustering. We investigated the effect of texton dictionary size on containing different textural information compared to the breast
the overall performance. As can be seen in Fig. 7, the best results area, hence the overall texton histogram was also different from the
in terms of CA(%) and Az value were obtained by setting the actual representation of the benign mass. Similarly, some noise
aspects can be observed in Fig. 8c, where a malignant RoI was
number of clusters equal to 10 per class (benign and malignant),
classified as benign.
with the size of texton dictionary equal to 20. In the original work
proposed by Varma and Zisserman [19], they mentioned that the
overall classification results were improved by increasing the size 5 Comparison
of clusters per class. As the proposed method is derived from the texton-based
Another important factor for the patch-based texton approach is approaches [20], in this section a detailed comparison of the
the size of the patch used for the texton dictionary generation. For proposed work with the original texton-based approach and
the proposed work, patch size of 7 × 7 resulted in improved alternative techniques are presented.
classification results as compared to other patch sizes (3 × 3, 5 × 5,
9 × 9 and 11 × 11). The results in terms of CA and Az for several
5.1 Traditional texton-based approach
patch sizes can be found in Table 2.
According to the texton-based work [20], the overall classification
4.1 Discussion on misclassified instances was improved when raw data patches from the images
(representing textures) were taken instead of using filter responses.
In total there were 17 misclassified instances (9 instances are Varma and Zisserman [20] concluded that filter responses were not
misclassified from the benign class and 8 are misclassified from necessary and the raw image data could be used to classify the
the malignant class). When the misclassified instances were texture classes by extracting very compact size patches [20]. This
investigated, it was found that most of the instances were section presents some classification results when raw image data
misclassified due to noise artefacts or inclusion of tissue other than was used instead of using the filter response images.
the breast region (i.e. pectoral muscle), which were rare in the used To be consistent with the approach presented for the mass
dataset. Two randomly selected misclassified RoIs for both classes classification using filter responses, again the dataset was
can be found in Fig. 8. comprised of 200 RoIs (100 benign and 100 malignant) for model
Figs. 8a and b show benign RoIs that were misclassified by the building and 100 (50 benign and 50 malignant) for evaluating the
proposed method. It can be seen that Fig. 8a contains a dark region developed model. The original image RoIs (size equals to 225 × 

1064 IET Comput. Vis., 2018, Vol. 12 Iss. 8, pp. 1060-1066


© The Institution of Engineering and Technology 2018
Table 2 Results for different patch sizes used
Patch size CA, % Az
3 × 3 75 0.81
5 × 5 80 0.83
7 × 7 83 0.89
9 × 9 70 0.78
11 × 11 67 0.69
The bold values are represented the optimum value for patch size in terms of best
classification accuracy.

225) from the dataset were resized to 175 × 175 (for the filter-based
method size of the input image was the same, i.e. 225 × 225 and
were reduced to 175 × 175 after getting filter responses).
Subsequently, patches of size 7 × 7 were extracted from each
RoI representing the mass region. Overlapping patches were
extracted in order to get additional samples from the model
building images to capture data variations. For a single RoI, 28,561
patches of size 7 × 7 were extracted. In total, 2,856,100 patches
were extracted for each of the benign and malignant classes. For K-
means clustering, k was set to 10, which resulted in 10 clusters for
each class. The overall set-up is quite similar to the approach
presented in this paper except that the patches are now representing Fig. 8  Randomly selected misclassified instances
the raw image data instead of using filter response images. After (a), (b) Benign RoIs, (c), (d) RoIs from the malignant class
generating the texton dictionary, evaluation data was used with a
similar processing pipeline. A histogram was generated for each Table 3 Confusion matrix using raw image patches
evaluation RoI which was assigned to the closest texton. This
Benign Malignant
histogram was then used as a feature for classifying masses but the
length of the feature vector in this case is 20 instead of 1060 as was benign 34 16
the case when using filter response patches. malignant 14 36
The classification setup was the same, i.e. using a naive Bayes
classifier and a 10-FCV scheme. After attribute selection the total
number of attributes was reduced to 5. The final CA when using Table 4 Overview of existing techniques developed for
raw image patches was 70% with the Az value equal to 73%. The mass classification in mammograms
confusion matrix for this experiment can be seen in Table 3, which Author Features Best reported
shows a considerable difference in terms of true and false classified results, %
instances compared to the results in Table 1. Rangayyan et al. [13] shape, textures CA = 95
From the presented classification results for benign and Mudigonda et al. [7] shape, texture CA = 82.1
malignant masses, unlike the material texture classification [20], Valarmathie et al. [11] shape, margin and CA = 94
patches from the filter responses are more useful rather than using texture
raw image patches. The likely reason is that both benign and
Mu et al. [8] shape, texture Az = 0.95
malignant mass areas are very similar in terms of intensity values,
but the filter responses are useful for producing more distinct Rouhni et al. [12] shape, texture CA = 96.47
features (e.g. boundaries and lines) to classify both classes. Campos et al. [10] ICA CA = 97.3
Dong et al. [16] mass area CA = 97.73
5.2 Comparison with the alternative approaches Boujelben et al. [14] boundary of the CA = 97.9
mass area
Table 4 is summarising the existing work with respect to
alternative approaches for mass classification in mammograms. Işikli Esener et al. [17] LCP, statistical, CA = 94.67
As discussed in Section 2, the shape of the mass is one of the frequency domain
important factors for the characterisation of benign and malignant Buciu and Gacsadi [18] gabor Az = 0.78
mass in mammograms [6]. In the literature review of this paper, Li et al. [22] subsampled textons CA = 85.96
several methods have been discussed [7, 8, 11–13] that used the Varma and Zisserman [20] traditional textons CA = 70
shape of the mass as a major feature (with other features as texture, proposed approach modified textons CA = 83
intensity etc.) and provided promising results for mammographic
mass classification (benign versus malignant). In clinical practice,
it is not always possible for the boundary of the mass region to be
provided so that the shape features can be extracted and be used for equal to 98% using shape, margin and texture features, however, a
the classification. The current approach of using texture features very low CA was reported using only the texture features (∼60%)
(based on texton modelling) is different from the approaches for the mass classification in mammograms. Similarly, good
presented in the literature review, that used shape features keeping classification results have been reported by Boujelben et al. [14],
into account the problem that the mass segmentation is not always where the boundary information of the mass region has been
available at the classification stage and therefore only the texture of exploited to classify the RoIs as benign or malignant. Işikli Esener
the mass region is used to extract the features. et al. [17] used LCP features in combination with the statistical and
Campos et al. [10] used ICA to extract the features that were frequency domain features and reported a CA equal to 94.67% for
used for classifying the patches extracted from the mammograms. classifying mammographic RoIs as normal, benign or malignant. In
They reported an accuracy of 97.3%, however, the ICA feature their work, they used the masses corresponding to only fatty
extraction seems to be based on all data and no training/testing tissues, whereas the dataset used for the current approach uses
separation is used. In [16], several features have been extracted randomly selected masses that belongs to different tissue densities
from the segmented mass area and a CA of 97.73% has been and not restricted to a particular type of tissue class. Other closely
reported as the best results based on the single fold with average related work to the current approach in the literature is found in
results just above 90%. In [11], the best CA has been reported to be [18], where Gabor wavelets have been used to filter the images and
then PCA was applied to reduce the data dimensionality. By using
IET Comput. Vis., 2018, Vol. 12 Iss. 8, pp. 1060-1066 1065
© The Institution of Engineering and Technology 2018
only the Gabor wavelets on the images, they reported results in [5] Oliver, A., Freixenet, J., Marti, J., et al.: ‘A review of automatic mass
detection and segmentation in mammographic images’, Med. Image Anal.,
terms of Az equal to 0.78 for the classification of benign and 2010, 14, (2), pp. 87–110
malignant masses. The current approach of using various filter [6] American College of Radiology BI-RADS Committee and American College
responses in combination with the texton-based approach provided of Radiology: ‘Breast imaging reporting and data system’ (American College
results in terms of Az equal to 0.89, that is an improvement on the of Radiology, Reston, VA, USA, 1998)
[7] Mudigonda, N.R., Rangayyan, R., Desautels, J.L.: ‘Gradient and texture
results achieved by Buciu and Gacsadi [18] using only Gabor analysis for the classification of mammographic masses’, IEEE Trans. Med.
wavelets. Imaging, 2000, 19, (10), pp. 1032–1043
As mentioned in Section 2, texton-based work has been done in [8] Mu, T., Nandi, A.K., Rangayyan, R.M.: ‘Classification of breast masses using
selected shape, edge-sharpness, and texture features with linear and kernel-
the past for the classification of benign and malignant masses by Li based classifiers’, J. Digit. Imaging, 2008, 21, (2), pp. 153–169
et al. [22], which showed CA equal to 85.96% that is comparable [9] Ball, J.E., Bruce, L.M.: ‘Digital mammographic computer aided diagnosis
to the developed approach. (CAD) using adaptive level set segmentation’. 29th Annual Int. Conf. of the
IEEE Engineering in Medicine and Biology Society, Lyon, France, 2007, pp.
4973–4978
6 Future directions [10] Campos, L., Silva, A., Barros, A.: ‘Diagnosis of breast cancer in digital
mammograms using independent component analysis and neural networks’.
In the future, we will examine the effect of using different sizes of Iberoamerican Congress on Pattern Recognition, Berlin, Germany, 2005, pp.
texton dictionary (different number of clusters) to end up with the 460–469
best combination of patch size and the size of the texton dictionary [11] Valarmathie, P., Sivakrithika, V., Dinakaran, K.: ‘Classification of
mammogram masses using selected texture, shape and margin features with
in terms of CA. In the method proposed by Varma and Zisserman multilayer perceptron classifier’, Biomed. Res., 2016, pp. S310–S313
[19], CA improved by increasing the number of clusters, whereas [12] Rouhi, R., Jafari, M., Kasaei, S., et al.: ‘Benign and malignant breast tumors
in the current method at a certain point the classification classification based on region growing and CNN segmentation’, Expert Syst.
performance degraded by increasing the number of clusters. Appl., 2015, 42, (3), pp. 990–1002
[13] Rangayyan, R.M., El-Faramawy, N.M., Desautels, J.L., et al.: ‘Measures of
Further statistical evaluation needs to be performed in order to acutance and shape for classification of breast tumors’, IEEE Trans. Med.
explore if the difference between the texton-based approach [19] Imaging, 1997, 16, (6), pp. 799–810
and the current patch-based method is significant. [14] Boujelben, A., Chaabani, A.C., Tmar, H., et al.: ‘Feature extraction from
Deep learning is a new area of machine learning and several contours shape for tumor analyzing in mammographic images’, Digit. Image
Comput., Tech. Appl., 2009, pp. 395–399
techniques have been proposed in the field of CAD system [15] Ertas, G., Gulcur, H., Aribal, E., et al.: ‘Feature extraction from
development [26–28] giving promising results. It should be noted mammographic mass shapes and development of a mammogram database’.
that deep learning tends to rely on large annotated datasets which 2001 Proc. of the 23rd Annual Int. Conf. of the IEEE Engineering in
are not always available. In addition, the best results tend to be Medicine and Biology Society, Istanbul, Turkey, 2001, Vol. 3, pp. 2752–2755
[16] Dong, M., Lu, X., Ma, Y., et al.: ‘An efficient approach for automated mass
obtained using a mixture of deep learning and handcrafted features. segmentation and classification in mammograms’, J. Digit. Imaging, 2015,
In the presented work, the primary focus was on exploiting 28, (5), pp. 613–625
traditional machine learning approaches (and features) for [17] Işkl Esener, İ., Ergin, S., Yüksel, T.: ‘A new ensemble of features for breast
providing improved results for the classification of benign and cancer diagnosis’. 38th Int. Convention on Information and Communication
Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia,
malignant masses. In the future, we will investigate the effects of 2015, pp. 1168–1173
using deep learned features in combination with the proposed [18] Buciu, I., Gacsadi, A.: ‘Directional features for automatic tumor classification
feature set (texton histograms) for classifying masses as benign or of mammogram images’, Biomed. Signal Proc. Control, 2011, 6, (4), pp. 370–
malignant using additional data. 378
[19] Varma, M., Zisserman, A.: ‘A statistical approach to texture classification
from single images’, Int. J. Comput. Vis., 2005, 62, (1–2), pp. 61–81
7 Conclusions [20] Varma, M., Zisserman, A.: ‘A statistical approach to material classification
using image patch exemplars’, IEEE Trans. Pattern Anal. Mach. Intell., 2009,
In conclusion, we have proposed a modified texton-based approach 31, (11), pp. 2032–2047
to classify pre-detected mammographic abnormalities as benign or [21] Kinoshita, S.K., Marques, P.A., Slaets, A.F.F., et al.: ‘Detection and
characterization of mammographic masses by arpngicial neural network’,
malignant. Patches from filter responses were used to make the Digital Mammography, 1998, 13, pp. 489–490
texton dictionary and texton frequency histograms from all 53 filter [22] Li, Y., Chen Rohde, H.G.K., Yao, C., et al.: ‘Texton analysis for mass
responses for each RoI were aggregated to form features for classification in mammograms’, Pattern Recognit. Lett., 2015, 52, pp. 87–93
classifying the benign and malignant masses. The developed model [23] Heath, M., Bowyer, K., Kopans, D., et al.: ‘The digital database for screening
mammography’. Proc. of the 5th Int. Workshop on Digital Mammography,
was evaluated on a subset of the DDSM dataset. Results were Medical Physics Publishing, Toronto, Canada, 2000, pp. 212–218
comparable to alternative state-of-the-art methods [22]. [24] Gabor, D.: ‘Theory of communication’, J. Inst. Electr. Eng. Part III, Radio
Commun. Eng., 1946, 93, (26), pp. 429–441
[25] Eibe, F., Mark, A.H., Ian, H.W.: ‘The WEKA Workbench. Online appendix for
8 References ‘Data Mining: practical machine learning tools and techniques’’ (Morgan
Kaufmann, Cambridge, MA, USA, 2016, 4th edn.)
[1] National Health Service-Breast Screening: ‘Professional guidance’, 31 August
[26] Jiao, Z., Gao, X., Wang, Y., et al.: ‘A deep feature based framework for breast
2016. Available at https://www.gov.uk/government/collections/breast-
masses classification’, Neurocomputing, 2016, 197, pp. 221–231
screening-professional-guidance
[27] Jadoon, M.M., Zhang, Q., Haq, I.U., et al.: ‘Three-class mammogram
[2] Tabár, L., Dean, P.B.: ‘Breast cancer-the art and science of early detection
classification based on descriptive CNN features’, BioMed Res. Int., 2017,
with mammography’ (Thieme, New York, 2005), ISBN: 3-13-131
2017, pp. 1–11
[3] Djaroudib, K., Ahmed, A.T., Zidani, A.: ‘Textural approach for mass
[28] Ribli, D., Horváth, A., Unger, Z., et al.: ‘Detecting and classifying lesions in
abnormality segmentation in mammographic images’, 2014, arXiv preprint
mammograms with deep learning’, Sci. Rep., 2017, 8, Article number: 4165,
arXiv:1412.1506
p. 4165
[4] Elter, M., Horsch, A.: ‘CADx of mammographic masses and clustered
microcalcifications: a review’, Med. Phys., 2009, 36, (6), pp. 2052–2068

1066 IET Comput. Vis., 2018, Vol. 12 Iss. 8, pp. 1060-1066


© The Institution of Engineering and Technology 2018

You might also like