You are on page 1of 5

PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY VOLUME 36 DECEMBER 2008 ISSN 2070-3740

A Soft-Decision Approach for Microcalcification Mass Identification from Digital Mammogram


B.Surendiran, A.Vadivel, Henry Selvaraj
AbstractBreast cancer is one of the major causes of fatality among women aged 40 and above. Digital mammography is used by radiologist for analysis and interpretation of cancer. Visual reading and interpretation of mammograms is a very demanding and expensive job. Even well-trained experts may have an interobserve variation rate of 65-75 percent. Computer Aided Diagnosis (CAD) systems have been developed to complement radiologists in interpreting mammograms for mass detection and identification of calcification. Thus, it is very important to develop CADs that can identify malignant lesions effectively. A combination of CAD scheme and expert knowledge would effectively improve the rate of detection and accuracy of masses. We use a soft-decision approach for identifying the microcalcification mass present in digital mammograms. A suitable clustering algorithm is applied for partitioning the digital mammogram into various meaningful regions. During post processing phase, the background region is identified and not considered for further processing. The Coefficient of Variation (CV) of various regions of the partitioned mammogram is calculated and the microcalcification lesion present in mammogram is identified. The experimental result is found to be encouraging. Keywords - Mammogram, Soft-decision, Microcalcification, Gray Weight
I. INTRODUCTION

It has been found that breast cancer occurs in over eight percent of women during their lifetime, which has become one of the leading causes of death [4]. It has been noticed from various studies that there is a positive association of tissue type with potential breast cancer risks [6],[13]. Further, women who have breast cancer can easily get contralateral cancers in the other breast [8],[10]. However, distinguishing a new primary from metastasis is not always possible due to their similarities in features. The asymmetry property of breast parenchyma between the two sides has been found to be one of the useful signs for detecting primary breast cancer [7]. While various methods have been proposed and available for early detection and screening of breast cancers, the mammography is being considered as one of the most effective method [1]. Two important early signs of the disease are micro calcifications
B.Surendiran is currently working as research scholar at National Institute of Technology Tiruchirappalli, India. (Email:405107004@nitt.edu). A Vadivel is with the Department of Computer Applications, National Institute of Technology Tiruchirappalli, India. (Corresponding Author: Email:vadi@nitt.edu ). Henry Selvaraj is with the Department of Electrical and Computer Engineering, University of Nevada, Las Vegas, Nevada, United States of America, 89154 (Email: selvaraj@unlv.nevada.edu)

have been proposed and available for early detection and screening of breast cancers, the mammography is being considered as one of the most effective method [1]. Two important early signs of the disease are micro calcifications and masses [2]. Among these two signs, masses are considered to be more difficult to detect than microcalcifications, since in this category the low-level features are usually found to be obscured or similar to normal breast parenchyma. In addition, the masses are quite thin and often present in the dense areas of the breast tissue. It has smoother boundaries than micro calcification and has shapes like circumscribed, speculated, lobulated or illdefined. The circumscribed ones usually have distinct boundaries of 2-30mm in diameter and high-density radiopaque. Among these, the speculated ones have rough, star-shaped boundaries and the lobulated ones have irregular shapes [3]. The masses present must be classified as benign and malignant for improving the biopsy yield ratio. Further, the masses are classified as malignant and benign based on certain properties of the respective region. While the masses with radiopaque and more irregular shapes are usually defined as malignant, regions combined with radiolucent shapes are being defined as benign [14]. The content of a mammogram can be differentiated with four levels of intensity such as background, fat tissue, breast parenchyma and calcifications with increasing intensities. The masses develop from the epithelial and connective tissues of breasts and their densities on mammograms are inseparably together with parenchyma pattern. In medical viewpoint, reading visually and interpreting mammograms is considered to a very demanding job for radiologists. Their judgment essentially depends on the training, experience and subjective criteria. Even well-trained experts may have an inter-observe variation rate of 65-75 percent. Computer Aided Diagnosis (CAD) systems have been developed to complement radiologists in interpreting mammograms for mass detection and identification of calcification. This has given an edge over the diagnosis and observed that 65-90 percent of the biopsies of suspected cancers turned out to be benign. Thus, it is very important to develop CAD systems that can distinguish between benign and malignant lesions effectively. The combination of CAD scheme and experts knowledge would effectively improve the rate of detection accuracy of masses. While the detection sensitivity without CAD is found to be 80 percent, with CAD, the detection sensitivity is found to be up to 90 percent [5]. Generally, most of the existing mass detection CAD schemes involve various phases such as digitizing mammograms, image processing, image segmentation, feature extraction and selection, classification and

The asymmetry property of breast parenchyma between the two sides has been found to be one of the useful signs for detecting primary breast cancer [7]. While various methods

PWASET VOLUME 36 DECEMBER 2008 ISSN 2070-3740

1236

2008 WASET.ORG

PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY VOLUME 36 DECEMBER 2008 ISSN 2070-3740

evaluation. First, the image preprocessing is being carried out by which the mammogram is digitized to suppress noise and improve the contrast content discrimination. Secondly, image segmentation is performed for locating the suspicious regions and they are considered to be the Regions of Interest (ROI). However, it is different from the common definition of segmentation in image processing. In the third phase, the low-level features are extracted and selected for classifying lesion types or removing false positives. Finally, the detection or classification of masses is being performed. Although, the CAD schemes were independently developed using different data sets of limited size, most of the schemes yielded similar performance, in the rage of 85-95 percent true positives rate, with 1-2 percent false positives. In this paper, we use a soft-decision approach from the HSV color space for identifying microcalcification masses present in digital mammogram. The K-means clustering algorithm is applied for partitioning the digital mammogram into various meaningful regions. The Coefficient of Variation (CV) of various regions of the segmented mammogram is used as a statistical measure for identifying microcalcification masses. The soft-decision based gray content estimation from the HSV color space is presented in Section 2. In Section 3, we present pixel grouping by K-means clustering algorithm and post processing schemes. The experimental result is presented in Section 4 and we conclude the paper in the last Section.
II. SOFT-DECISION BASED GRAY CONTENT ESTIMATION FROM THE HSV COLOR SPACE

The content of mammogram can be discriminated with four intensity levels. However, it is known that the mass present in mammogram is surrounded by smooth boundaries. Thus, it is essential to capture the boundary information through pixel values for partitioning the masses in spatial domain. We do spatial domain processing in the HSV color space, since this color space is closely related to the human visual perception of color and gray pixels. For each pixel, a weighted value is calculated from the soft-decision function and it captures the degree of gray content of a pixel. The weight function is found to be robust against noise [11]. The function is given below:

GW (S , I ) = 1 S

r1 (255 / I )

r2

for !( R = G = B) (1)

The range of GW (S , I ) is [0-1] and it estimates the degree of gray content of a pixel using both the saturation and intensity values. From Eq. 1, the gray weight value holds for ! ( R = G = B ) . This is due to the fact that for R=G=B, the saturation of a pixel is zero and the gray weight value will always be zero irrespective of the intensity value. However, to capture effectively the degree of gray content of a pixel, we slightly perturb either the value of R, G or B, which influences the saturation value of a pixel. Further, the function is smooth with saturation value and found to be continuous [12] as shown in Figure 1. It is evident from Fig.1 that r1 should take a value which is slightly higher than 0.0 and r2 should take a value slightly less than 1.0 for having smooth variation gray level weighted values. From our earlier work, it is found that the suitable values for r1=0.1 and r2=0.85.

2 1.5
DS

Sat=0.0

Sat=0.2

Sat=0.4

Sat=0.6

Sat=0.8

Sat=1.0

1 0.5 0
0 20 40 60 80 100 120 140 160 180 200 220 240

Intensity

Fig. 1. Variation of Partial Derivative of GW (S, I) with S for Different Values of Saturation with r1=0.1, r2=0.85

III. PIXEL GROUPING BY K-MEANS CLUSTERING ALGORITHM AND POST PROCESSING

PWASET VOLUME 36 DECEMBER 2008 ISSN 2070-3740

1237

2008 WASET.ORG

PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY VOLUME 36 DECEMBER 2008 ISSN 2070-3740

We perform a series of post-processing after applying the clustering algorithm on the weighted gray value of pixels of mammogram, which is calculated using Eq. 1. The clustering problem is to represent the mammogram as a set of n non-overlapping partitions and is given below: I {O1|O2|O3|.|On} (2)

Fig. 2. Result of K-means Clustering for grouping the pixels and post Processing (a) Original Mammogram (b) After applying KMeans Clustering (c) Applying Connected Component analysis (d), (e) and (f) Final refined mammogram

In Figure 3, we also show the segmented mammogram images without performing connected component analysis.

Here, each Oi consists of the position of all the image pixels and its equivalent gray weight, size and center. We use KMeans clustering for pixel grouping. In K-Means clustering algorithm, we start with K=2 and adaptively increase the number of clusters till maximum number of clusters is reached, which is 8 and the result of clustering and post processing is shown in Figure 2. In Fig. 2(a), we show a large irregularly-shaped spiculated lesion mammogram. In Figure 2(b), we show the transformed image after it is clustered based on gray weight. It is noticed from Fig. 2(b) that the clustered pixels do not yet contain sufficient information about the various regions in the image because it is not yet known if all the pixels that belong to the same cluster are actually part of the same region or not. To ascertain this, we perform a connected component analysis [9] of the pixels to determine the different regions in the mammogram. We also identify the connected components whose size is less than a certain percentage of the size of the mammogram. These small regions are to be merged with the surrounding clusters in the next step. Such regions which are candidates for merger are shown in white in Fig. 2(c). In the last post-processing step, the small regions are merged with their surrounding regions with which they have maximum overlap. The mammogram at the end of this step is shown in Fig. 2(d). It is seen that the various foreground and background objects of the mammogram have been clearly segmented. The segmented mammograms for analysis and interpretation is shown in Fig. 2(e)-(f).

(a)

(b)

(c)

(d)

Fig 3. Mammogram segmentation with different K (a) Original mammogram (b) Segmented mammogram with K=4 (c) Segmented mammogram with K=6 (d) Segmented mammogram with K=8.

2(a)

2(b)

2(c)

It is observed from Fig. 3 that the microcalcification masses present is found to be very small with ill-defined boundary. Using connected component analysis approach for merging smaller regions with neighboring larger regions, in this case, result in loss of information about the microcalcification mass region. Thus, for our experiments, we adaptively decide whether to use connected component analysis or not. In another post-processing approach, we group the regions as larger and smaller by choosing a suitable threshold on size of the regions as given below. I1 {O1|O2|O3|.|Ok} (3) I2 {Ok+1|Ok+2|Ok+3|.|On} (4) where k<n. As a result of this partition, the background information is also eliminated. In addition, regions having gray weight less than the average of gray weight of the entire mammogram are also discarded and not considered for further calculation to avoid the small background regions. Finally, since the microcalcification mass area in a mammogram is small, it will fall in the smaller partition.
IV. EXPERIMENTAL RESULTS

2(d)

2(e)

2(f)

We have used mammogram from the Digital Database for Screening Mammography database, The Computer Vision / Image Analysis Research Laboratory at the University of

PWASET VOLUME 36 DECEMBER 2008 ISSN 2070-3740

1238

2008 WASET.ORG

PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY VOLUME 36 DECEMBER 2008 ISSN 2070-3740

South Florida (http://marathon.csee.usf.edu/Mammography/DDSM) for carrying out our experiments and considered the malignancy category only. We have used only the gray level information and the degree of grayness captured through gray weight function. The Coefficient of Variation (CV) has been used as statistical measure for identifying the microcalcification masses. The CV is the ratio of Standard Deviation to Mean. The advantage of using CV is that the

standard deviation of data should always be measured in the context of the mean of the data. Since, the CV is a dimensionless number and when comparing data sets with different dimension, units or wildly different means, we should use the CV for comparison instead of the standard deviation. We found that using CV measure is more suitable, as the size of various regions present in mammogram will have different dimensions.

TABLE 1. COEFFICIENT OF VARIATION (CV) OF VARIOUS REGIONS OF MAMMOGRAM WITH CANCER Mammogram with Region Region Region Region Region Region Region Region Region Region cancer 1 2 3 4 5 6 7 8 9 10 C-0001-1 0 0 0 0 0.09868 0.81863 0.03808 0.00224 0.020554 1.7683 C-0002-1 0.00036 0.00597 0.0734 0.0017 0.00068 0.00052 0.00013 0 0.000111 0.009343 C-0003-1 0 0 0 0.0002782 0.0158686 0 0.0002724 0.076804 0.0005980 C-0004-1 0.00503 0 0 0.0003 0 0.005735 0.014623 0.000552 0.002074 0.05588 C-0006-1 0 0.03225 0.0003 0 0.00613 0 0.001753 0 0.00049 0.052279 C-0007-1 0.001262 0 0.003033 0.0108 0.000244 0.000771 0.001624 0.001999 0.000514 0.1508 C-0009-1 0.01377 0.000319 0 0 0.0026 0 0 0 0.001878 0.07629 C-00010-1 0 0 0.0005 0 0 0.0075 0.01261 0.000227 0.000849 0.075549 [1]. K. Bovis, S. Singh, J. Fieldsend, C. Pinder, Identification of masses in digital mammograms with MLP and RBF nets, in: In Table 1, we show the CV for various regions present in Proceedings of the IEEE-INNS-ENNS International Joint cancer category of DDSM mammograms. The smaller Conference on Neural Networks Com, 2000, pp. 342347. regions only are considered for the calculation of CV and [2]. H.D. Cheng, X.P. Cai, X.W. Chen, L.M. Hu, X.L. Lou, Computer-aided detection and classification of the region with higher value of CV is highlighted. From microcalcifications in mammograms: a survey, Pattern our experimental results, it can be noticed that the regions Recognition, Vol. 36, 2003, pp. 29672991. with higher value of CV are identified as microcalcification [3]. I. Christoyianni, E. Dermatas, G. Kokkinakis, Fast detection mass area compared to the rest of the regions of the of masses in computer-aided mammography, IEEE Signal Process.Mag, Vol. 17 (1), 2000, pp. 5464. mammogram. This is due to the fact that the gray weight [4]. A.S. Constantinidis, M.C. Fairhurst, F. Deravi, M. Hanson, values of the pixels of the microcalcification mass region is C.P. Wells, C. Chapman-Jones, Evaluating classification expected to show smooth texture behavior, which may not strategies for detection of circumscribed masses in digital be distinguished by human. The weighted gray value mammograms, in: Proceedings of 7th International Conference on Image Processing and its Applications, 1999, captures this smooth variation and is measured in terms of pp.435439. CV. In the Table 1, the regions with zero value shows that [5]. K. Doi, Computer-aided diagnosis: potential usefulness in either the mean or the standard deviation of the respective diagnostic radiology and telemedicine, in: Proceedings of region is zero and thus the coefficient of variation is either National Forum 95, 1996, pp. 913. [6]. R.L. Egan, R.C. Mosteller, Breast cancer mammography infinity or zero respectively. patterns, Cancer, Vol. 40, 1977, pp. 20872090. [7]. T.J. Rissanen, H.P. Makarainen, M.A. Apaja-Sarkkinen, E.L. Lindholm, Mammography and ultrasound in the diagnosis of V. CONCLUSION contralateral breast cancer, Acta Radiol. Vol, 36, 1995, pp. Identification of microcalcification masses present in digital 358366. [8]. G.F. Robbins, J.W. Berg, Bilateral primary breast cancers, A mammogram is considered to be a difficult task. We have Prospective Clinicopathol, Study Cancer, Vol. 17, 1964, pp. used a soft-decision approach for identifying the 15011527. microcalcification mass present in digital mammograms. [9]. G. Stockman and L. Shapiro, Computer Vision, Prentice The Coefficient of Variation (CV) is used as a measure for Hall, 2001. [10]. H.H. Storm, O.M. Jensen, Risk of contralateral breast cancer identifying the region. The experimental results are in Denmark 1943-80, Br. J. Cancer, Vol. 54, 1986, pp. 483 encouraging. As a future direction to this work, we will use 492. other statistical measures such as smoothness, entropy, [11]. A. Vadivel, Shamik Sural and A.K. Majumdar, An Integrated skew ness and uniformity and construct a statistical feature Color and Intensity Co-Occurrence Matrix, Pattern Recognition Letters, Elsevier Science, Vol. 28(8), pp. 974-983, vector. We will also propose Neural Network architecture 2007. with suitable training and learning algorithm to classify the [12]. A. Vadivel, Shamik Sural and A.K. Majumdar, Robust microcalcification regions. Histogram Generation from the HSV Space based on Visual Colour Perception., International Journal of Signal and Imaging Systems Engineering (in press). Acknowledgement: The work done by Dr. A.Vadivel is supported by [13] J.N. Wolfe, Breast patterns as an index of risk for developing research grants from Indo-US Science and Technology Forum breast cancer, Am. J. Roentgen. 126 (1976) 11301139. [14]. B. Zheng, Y.H. Chang, X.H. Wang, W.F. Good, D. Gur, (IUSSTF) Ref. No. IUSSTF/Fellowship/2007-08/5-2008 and the Application of a Bayesian belief network in a computerDepartment of Science and Technology, India, under Grant assisted diagnosis scheme for mass detection, SPIE SR/FTP/ETA-46/07 dated 25th October, 2007. Conference on Image Processing, Vol. 3661 (2), 1999, pp. 1553156

References

PWASET VOLUME 36 DECEMBER 2008 ISSN 2070-3740

1239

2008 WASET.ORG

PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY VOLUME 36 DECEMBER 2008 ISSN 2070-3740

B.Surendiran is a research scholar in the Department of Computer Applications, National Institute of Technology, Tiruchirappalli, India. His research interest includes digital mammogram analysis, image processing and computer networks. Dr. A Vadivel, is currently with the Department of Computer Applications, National Institute of Technology, Tiruchirappalli, India. He has obtained his both Masters and PhD from Indian Institute of Technology, Kharagpur, India. His research interest includes medical image processing, content based image and video retrieval, search engine architecture design with multi-feature support and SAR image analysis. He is Indo-US Research Fellow 2008. Also, he has been conferred Young Scientist Award by the Department of Science and Technology, Govt. of India in 2007. Dr. Henry Selvaraj is Chair and Professor of the Department of Electrical and Computer Engineering, University of Nevada, Las Vegas, USA. He earned his Ph.D. at Warsaw University of Technology, Warsaw, Poland. Before joining UNLV in 1998, Dr Selvaraj was a faculty member in Monash University, Australia. His research Interests includes Logic Synthesis, Digital Design, Programmable Devices, Artificial Intelligence, Multiple Valued Functions Digital Signal Processing, Bio-medical Image Processing, Networks and Path Planning. He has conferred with UNLV Alumni Student Centered Faculty Award for 2002. He is member of Curriculum Committee, Desert Pines High School (AOIT), Las Vegas, member of UNLV Faculty Senate (since 1999), Alternate member of Senate Tenure and Promotion Committee. He is general chair for many International conferences.

PWASET VOLUME 36 DECEMBER 2008 ISSN 2070-3740

1240

2008 WASET.ORG

You might also like