Professional Documents
Culture Documents
4, 2011 339
S. Thamarai Selvi
Department of Computer Technology,
Madras Institute of Technology,
Anna University, India
E-mail: stselvi@annauniv.edu
R. Malmathanraj*
Department of ECE,
National Institute of Technology,
Tiruchirappalli, India
E-mail: malmathan@gmail.com
*Corresponding author
1 Introduction
Breast cancer is one of the leading cancers among women in developed countries and is
the cause of death in approximately 20% of all females who die from cancer in these
countries. The World Health Organization’s International agency for Research on Cancer
in Lyon, France, estimates that more than 150,000 women worldwide die of breast cancer
each year. In India, breast cancer accounts for 23% of all the female cancers in
metropolitan cities like Mumbai, Calcutta and Bangalore. Although the incidence is
lower in India than in the developed countries, the burden of breast cancer in India is
alarming. Primary prevention is not possible since the cause of disease still is not known.
Survival from breast cancer is directly related to the stage at diagnosis. Thus, detection
of early and subtle signs of breast cancer requires high-quality images and skilled
radiologists. Most imaging studies and biopsies of the breast are conducted using
mammography or ultrasound, in some cases, Magnetic Resonance Imaging (MRI).
MRI is excellent at imaging the augmented breast, including both the breast implant itself
and the breast tissue surrounding the implant. MRI is also useful for staging
breast cancer, determining the most appropriate treatment, and for patient follow-up
after breast cancer treatment. One notable feature in MR Images is the contrast. Thus,
MR Images help oncologists and biomedical engineers alike in generating accurate
results (Weigel et al., 2010).
It has been proven that double reading of mammogram, by two radiologists, reduces
missed detection rate, but at a considerable expense. The estimated interobserve variation
rate of radiologists in breast cancer screening is only about 65–75% but the performance
would be improved if they were prompted with the possible locations of abnormalities.
Because of the nature of medical images, the classification of medical images is still
faced with challenges such as
• Low resolution and strong noise, two common characteristics in most medical
images. With these characteristics, medical images cannot be precisely segmented
and extracted for the visual content of their features
• Medical images are digitally represented in a multitude of formats based on their
modality and the scanning device used.
Another characteristic of medical images is that many images are represented in grey
level rather than colour. Since masses are often indistinguishable from the surrounding
parenchymal region, the automated mass detection and classification is more
challenging.
In an attempt to overcome these difficulties, there have been many attempts to assist
radiologists by prompting sites of potential abnormalities using Computer Aided Design
(CAD) tools. Currently, there are several image processing methods proposed for the
detection of tumours in mammograms. Various technologies such as fractal analysis
(Yang and Yan, 2000), multiresolution-based image processing (Brazokovic and
Neskovic, 1993; Chen and Lee, 1997) and Markov Random Field (MRF) (Li et al., 1995)
Mammogram tumour classification using Q learning 341
have been used. Brazokovic and Neskovic (1993) described an algorithm for tumour
detection from mammogram based on fuzzy pyramid linking and multiresolution
segmentation. Threshold technique uses only grey-level information (Hao et al., 1999).
Consequently, this research work intends to increase the efficiency by implementing
a novel approach in medical image segmentation using Q-learning algorithm for
multilevel thresholding technique, feature extraction and neural network classifier. In this
research work, a Reinforcement Learning (RL) method for maximum entropy-based
thresholding is implemented to segment the tumour region in the mammograms.
The thresholding process can be divided into two phases. In the first phase, Q-values
for selected actions are updated, starting from the initial state to the final state. This is
done for a specified number of epochs, which is specified as input. Actions are selected
according to the action selection rule. After learning is complete, the optimum thresholds
are computed using the updated Q-values. In the second phase, the input image is
segmented into multiple binary images according to the optimal thresholds. The goal
is to maximise the cumulative entropy using RL. The paper is arranged as follows:
Section 1 explains the Introduction, Section 2 explains the multilevel thresholding,
Section 3 explains the features derived, Section 4 explains the classifiers, Section 5
explains the results and discussion and Section 6 explains the conclusion part of this
paper.
2 Multilevel thresholding
L −1 L −1
Pi Pi
H (t , L) = −∑ ln , ω1 = ∑ Pi ,
i =t ω1 ω1 i =t
and the optimal threshold is the grey level that maximises f1. It can be extended to the
multilevel thresholding case with n thresholds as follows. Maximise
fn(t1,t2,…,tn) = H(O, t1) + H(t1, t2) + … + H(tn–1, tn) + H(tn, L)
n
= ∑ H (ti , ti +1 ), (2)
i =0
The optimal n thresholds are those grey levels that maximise fn. Let Pij = h(i, j)/N
for i and j = 0, 1…L − 1. The optimal threshold pair (m, n) can be calculated as
maximise,
Mammogram tumour classification using Q learning 343
–f1(m, n) = H((O, O), (m, n)) + H((m, n), (L, L)) (3)
where
m −1 n −1 Pij Pij
H ((O, O), (m, n)) = ∑∑ ln ,
i =0 j =0 ω0 ω0
m −1 n −1
ω0 = ∑∑ Pij ,
i =0 j =0
L −1 L −1 Pij Pij
H ((m, n), ( L, L)) = ∑∑ ln ,
i=m j =n ω1 ω1
L −1 L −1
ω1 = ∑∑ Pij .
i=m j =n
Algorithm
The first dimension indicates the serial order of the threshold and the second dimension
represents the grey level of it.
The agent starts from S0,0 and repeats choosing an action and getting into another state
until the state Sn+1;L is observed. Now, the key components of the Q-learning algorithm
for the 1D entropy thresholding are defined as follows.
344 S. Thamarai Selvi and R. Malmathanraj
where Q(sn+1, L, aj) = 0 sets the boundary case to avoid being undefined.
Now, equation (2) can be rewritten as
max f n (t1 , t2 ,..., tn ) = max Q( s0,0 , a j ). (5)
t1 , t2 ,..., tn aj
In other words, learning the optimal selection of n thresholds to maximise the cumulative
entropy is equivalent to learning the optimal policy of choosing a sequence of n actions
to maximise the cumulative reward. For bilevel thresholding case, the recursive definition
of Q(sk,i, aj) is given by
0 + max Q( sk +1, j , al ) if k = 0,1
al
Q ( sk , i , a j ) = , (6)
f1 (i, j ) + max
al
Q( sk +1, j , al ) if k = 2
and Q(s3,L, aj) = 0 sets the boundary case. Thus, the objective function of the 2D entropy
bilevel thresholding can be rewritten as
max f1 (m, n) = max Q( s0,0 , a j ). (7)
m,n aj
The algorithm is repeated for the given maximum number of iterations. Then, the
n sequential actions chosen by the learned optimal policy determine the n optimal
thresholds (Figure 1).
Mammogram tumour classification using Q learning 345
3 Features derived
Intensity features GLDS features Shape features
FI1: Contrast measure of tumour FT1: Contrast FG1: Convexity
FI2: Average grey level of tumour FT2: Angular second FG2: Rectangularity
moment
FI3: Standard derivation FT3: Entropy FG3: Perimeter
FI4: Skewness of tumour FT4: Mean FG4: Centroid
FI5: Kurtosis of tumour RLS features: FG5: Minor axis length
FI6: A set of features composed FT5: Short runs emphasis FG6: Major axis length
of third-order normalised
Zernike moments
FI7: Mean/variance FT6: Long runs emphasis FG7: Eccentricity
FI8: Mean absolute deviation FT7: Grey-level FG8: Orientation
non-uniformity
FT8: Run length FG9: Fourier descriptor
non-uniformity FG10: Euler number
FG11: Solidity
13 Compactness is defined as the ratio of area of tumour region to the area of the
smallest rectangle that circumscribes the tumour region.
14 Perimeter.
15 Skewness is a measure of symmetry, or more precisely, the lack of symmetry.
A distribution, or data set, is symmetric if it looks the same to the left and right of the
centre point.
16 Kurtosis is a measure of whether the data are peaked or flat relative to a normal
distribution. That is, data sets with high kurtosis tend to have a distinct peak near
the mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis
tend to have a flat top near the mean rather than a sharp peak.
17 Mean/variance.
18 Mean absolute deviation.
Texture features
• Mean gradient with current region measures the average value of the gradient in each
region
N
1
mwg =
N
∑g .
k =1
k (8)
• Mean gradient of region boundary measures the average value of the gradient
in boundary
1 N
mg = ∑ g k′ .
N ′ k =1
(9)
4 Classifiers
Once the features related to masses are extracted and selected, the features are input
into a classifier to classify the detected suspicious areas into normal tissues, benign
masses, or malignant masses. The classifiers that have been used are Back Propagation,
Radial Basis Function (RBF), Learning Vector Quantisation (LQV) neural networks and
Support Vector Machines (SVMs).
where g(x) is a mapping function that maps x into the l-dimensional space, w is the
l-dimensional vector and b is a scalar. To separate data linearly, the decision function
satisfies the following condition:
Yi(wt g(x) + b) > 1 for i = 1,…, M. (16)
If the problem is linearly separable in the feature space, there are an infinite number
of decision functions. Among them, the hyperplane that has the largest margin between
two classes is required. The margin is the minimum distance from the separating
hyperplane to the input data and this is given by ||D(x)||/||w||. Then, we call the separating
hyperplane with the maximum margin optimal separating hyperplane. Assuming that the
margin is q, the following condition needs to be satisfied: Yi D(Xi)/w ≥ p i = 1, …, M,
minimising ½ w wt..The optimal separating hyperplane is determined so that the
maximisation of the margin and the minimisation of the training error are achieved.
When p = 1, the SVM is called L1 soft margin SVM (L1-SVM), and when p = 2, L2 soft
margin SVM (L2-SVM).
Pattern classification was carried out by using the BPN, RBF, LVQ and SVM
architectures. The set of feature vectors is split into two parts: training and test, through
a random choice. The training and test sets consisted of normal and abnormal feature
vectors. The feature vectors were prepared for pattern classification as described in
Section 4. The number of input nodes in the network is equal to the number of features,
and the number of output nodes is equal to the number of target class. The algorithm
is implemented in Matlab 7.1 (Mathworks, Natick, MA) with a desktop computer
(3 GHz Pentium IV processor and 3 GB RAM). For classification on the best three
layered MLP NN (17-19-20-6-1) with Tanh transfer functions in hidden layers was used.
The classifier delivered the best performance on Tanh activation function when used
for the neurons of the output layer. This is obvious because for classification, the output
processing must be non-linear for generation of arbitrary complex decision regions. In the
LVQ neural network for the initialisation of the codebook vectors, a set of vectors is
chosen from the training data, one at a time. All the entries used for initialisation must
fall within the borders of the corresponding classes, and this is checked by the K-nearest
neighbour (K-nn) algorithm. The accuracy of classification may depend on the number of
codebook entries allocated to each class. Various parameters for the SVM like
regularisation parameter C, degree of polynomial, sigma of RBF were varied as: C from
1 to 105, degree of polynomial from 0 to 9 and Gamma (γ) from 0.13 to 2.5 to choose
the best parameters for SVM. ROC as shown in Figure 2 is one of the best ways to
evaluate a classifier. ROC methodology is appropriate in situations, where there are 2
possible truth states (i.e., diseased/normal, event/non-event). From the results of the ROC
analyses, a reasonable trade-off between specificity and sensitivity is observed. For a
perfect classifier, the ROC must approach unity. The sensitivity and specificity values
represent a measure of the classification accuracy, which take into account the variance in
diagnostic consequences that a false diagnosis on a malignant abnormality can have and
vice versa. True negative findings are a correct benign diagnosis; False Positives (FPs)
are an incorrect diagnosis about a malignant abnormality. Sensitivity is a measure of
detecting cancer (TP/(TP + FN)) whereas specificity is a measure of the classification
Mammogram tumour classification using Q learning 349
where
MC: Number of malignant cases in the test set
BC: Number of benign cases in the test set
NIm: Total number of images in the test case.
Figure 2 Region of Convergence graph for BPN, RBF, LVQ and SVM architectures (see online
version for colours)
The images were thresholded by using the sample images. Let us consider an image
mdb001 from the as given in Figure 4, based multi-level thresholding method.
The thresholds obtained for the mammogram image mdb001 in the MIAS database are
T1, T2, T3, T4 and T5. The different set of features was extracted from the mdb001 image
with grey values above threshold T1, between T1 and T2, above T2, between T4 and T5 and
above T5. In Figure 3 the Monalisa image segmented with multiple thresholds 75, 130,
171 and 232. In Figure 5 mammogram image segmented with multiple thresholds
76, 133,163 and 195.
Then, the TPF value and value were calculated for BPN, RBF, LVQ and SVM
architectures in consideration as given in Tables 1 and 2.
350 S. Thamarai Selvi and R. Malmathanraj
Figure 3 Monalisa image segmented with multiple thresholds 75,130, 171 and 232 (see online
version for colours)
Figure 4 Mammogram image segmented with multiple thresholds 55, 77, 130 and 164
Figure 5 Mammogram image segmented with multiple thresholds 195, 163, 133 and 76
Table 1 True Positive Fraction and False positive Fraction values for BPN, LVQ, RBF and
SVM architectures
NN FS 1 FS 2 FS 3
Sl. no. Image name architecture TPF FPF TPF FPF TPF FPF
1 mdb 001 BPN 0.601 0.40 0.5 0 0.80 0.02
with RBF 0.62 0.59 0.7 0 0.82 0.03
thresholds
55, 130, 164 LVQ 0.73 0.70 0.9 0.4 0.85 0.12
and 192 SVM 0.80 0.28 0.92 0.1 0.91 0.13
2 mdb 002 BPN 0.570 0.497 0.7 0 0.75 0.06
with RBF 0.61 0.52 0.71 0.15 0.75 0.06
thresholds
76, 133, 163 LVQ 0.71 0.69 0.8 0.3 1 0
and 195 SVM 0.83 0.22 0.97 0.2 1 0
3 mdb 005 BPN 0.63 0.39 0.77 0.29 0.75 0.18
with RBF 0.69 0.40 0.86 0.19 0.75 0.18
thresholds
70, 123, 157 LVQ 0.75 0.31 0.87 0.09 1 0.22
and 200 SVM 0.82 0.02 0.91 0.04 1 0.03
Mammogram tumour classification using Q learning 351
Table 2 Summary of ANN performance with various features and different architectures
Sl. no. Features used Architecture No. of cases (malign/design) Training epochs OP rate Time
1 F1 BPN 108 (46,62) 3000 46.7 0.73
RBF 108 (46,62) 500 57.2 0.40
LVQ 108 (46,62) 50 51.5 0.29
SVM 108 (46,62) 50 46.6 0.27
2 F2 BPN 108 (46,62) 2700 54.2 0.69
RBF 108 (46,62) 300 47.9 0.45
LVQ 108 (46,62) 50 47.0 0.32
SVM 108 (46,62) 45 46.6 0.20
3 F3 BPN 108 (46,62) 3200 68.9 0.71
RBF 108 (46,62) 400 63.2 0.56
LVQ 108 (46,62) 75 50.4 0.41
SVM 108 (46,62) 50 49.5 0.31
4 F4 BPN 108 (46,62) 3400 70.2 2.16
RBF 108 (46,62) 500 70.8 0.75
LVQ 108 (46,62) 75 71.0 0.51
SVM 108 (46,62) 50 73.6 0.20
5 F5 BPN 108 (46,62) 2500 71.1 0.43
RBF 108 (46,62) 475 73.2 0.29
LVQ 108 (46,62) 65 73.5 0.28
SVM 108 (46,62) 50 75.0 0.19
6 Conclusion
The aim of this work is to develop a robust algorithm for segmentation of masses on MR
image, which is a unique challenge in mammogram tumour segmentation. The result
shows that this algorithm can identify the lesions of tumour present in the MR
images automatically. There is no intervention of radiologist and it reduces the time to
45–60 s from 5 to 10 min that will take for manual segmentation. This work may help the
physicians and radiologist in classifying tumour into benign or malignant using MR
images.
References
Brazokovic, D. and Neskovic, M. (1993) ‘Mammogram screening using multiresolution-based
image segmentation’, Int. J. Pattern Recog. Artif. Intelligence, Vol. 7, pp.1437–1460.
Chen, C.H. and Lee, G.G. (1997) ‘On digital mammogram segmentation and microcalcification
detection using multiresolution wavelet analysis’, Graphical Models Image Processing,
Vol. 59, pp.349–364.
Cheng, H.D., Cai, X., Chen, X.W., Hu, L. and Lou, X. (2003) ‘Computer-aided detection and
classification of microcalcifications in mammograms: a survey’, Pattern Recognition, Vol. 36,
pp.2967–2991.
352 S. Thamarai Selvi and R. Malmathanraj
Hao, X., Gao, S. and Gao, X. (1999) ‘A novel multi-scale nonlinear thresholding method for
ultrasonic speckle suppressing’, IEEE Trans. Med. Imaging, Vol. 8, pp.787–794.
Kapur, J.N., Sahoo, P.K. and Wong, A.K.C. (1985) ‘A new method for grey-level picture
thresholding using the entropy of the histogram’, Comput. Vision Graphics Image Process,
Vol. 29.
Li, H.D., Kallergi, M., Clarke, L.P., Jain, V.K. and Clark, R.A. (1995) ‘Markov random field for
tumor detection in digital mammography’, IEEE Trans. Med. Imag., Vol. 14, pp.565–576.
Pun, T. (1980) ‘A new method for gray level picture thresholding using the entropy of the
histogram’, Signal Process, Vol. 2, No. 3, pp.223–237.
SchVolkopf, B., Sung, K.K., Burges, C., Girosi, F., Niyogi, P., Poggio, T. and Vapnik, V. (1997)
‘Comparing support vector machines with Gaussian kernels to radial basis function
classifiers’, IEEE Trans. Signal Process, Vol. 45, pp.2758–2765.
Suykens, J.A.K. and Vandewalle, J. (1999) ‘Least squares support vector machine classifiers’,
Neural Process. Lett., Vol. 9, pp.293–300.
Vapnik, V. (1995) The Nature of Statistical Learning Theory, Springer-Verlag, New York.
Weigel, S., Schrading, S., Arand, B., Bieling, H., König, R., Tombach, B., Leutner, C.,
RiebKuhl Cer-Brambs, A., Nordhoff, D., Heindel,W., Reiser, M. and Schild, H. (2010)
‘Prospective multicenter cohort study to refine management recommendations for women at
elevated familial risk of breast cancer’, Journal Clinical Oncolology, Vol. 28, No. 9,
pp.1450–1457.
Yang, Y. and Yan, H. (2000) ‘An adaptive logical method for binarization of degraded document
images’, Pattern Recognition, Vol. 33, No. 5, pp.787–807.
Bibliography
Ilankumaran, V., Thamarai Selvi, S. et al. (2005) ‘Wavelet Implementation for ECG
characterization in pacemakers – an overview’, Caledonian Journal of Engineering, Vol. 01,
Peng-Yeng, Y. (2002) ‘Maximum entropy-based optimal threshold selection using deterministic
reinforcement learning with controlled randomization’, Signal Processing, Vol. 82,
pp.993–1006.
Poonguzhali, S. and Ravindran, G. (2008) ‘Automated detection of abnormal masses in ultrasound
images’, Int. J. Biomedical Engineering and Technology, Vol. 1, No. 3, pp.250–258.
Selvaraj, H., Thamarai Selvi, S., Selvathi, D. and Gewali, L. (2006) ‘Brain MRI slices classification
using least squares support vector machine’, International Journal of Intelligent Computing
in Medical Science and Image Processing.
Selvaraj, H., Thamarai Selvi, S., Selvathi, D. and Ramkumar, R. (2005) ‘Support vector machine
based automatic classification of human brain using MR image features’, International
Journal of Computational Intelligence and Applications.
Selvathi, D., Thamarai Selvi, S. and Alagappan, S. (2005) ‘Performance analysis of fuzzy logic
based filtering techniques for noise reduction from images’, International Journal on Lateral
Computing, Vol. 1, No. 2.
Selvathi, D., Thamarai Selvi, S. and Selvaraj, H. (2006) ‘Abnormality detection in brain MR
images using minimum error thresholding method’, International Journal of Computational
Intelligence and Applications, Vol. 6, No. 2, pp.177–191.
Thamarai Selvi, S., Selvathi, D., Selvaraj, H. and Ramkumar, R. (2006) ‘Least squares support
vector machine based classification of abnormalities in brain MR image’, Systems Science.