You are on page 1of 7

International Journal of Engineering Research & Technology (IJERT

ISSN: 2278-0181
Vol. 3 Issue 3, March - 2014

Feature Extraction and Classification of Lung
Cancer Nodule using Image Processing

Khin Mya Mya Tun Aung Soe Khaing
M.E Thesis Student, Department of Electronic Engineering Associate Professor, Department of Electronic Engineering
Mandalay Technological University Mandalay Technological University
Mandalay, Myanmar Mandalay, Myanmar

Abstract— The common cause of death among people throughout was used as a method in diagnosing lung tumor. First step
the human race is lung cancer. In this paper, median filter is used was removing noise from image using kernel based non-local
for image pre-processing. For segmentation, Otsu's thresholding neighborhood denoising function and done feature extraction
method is used. In feature extraction, physical dimensional based on histogram to classify between normal and abnormal
measures and gray-level co-occurrence matrix (GLCM) method
classes. At the final step or in tumor detection, level set-active
are used. Artificial neural network (ANN) is applied for
classification of disease stages. CT (computed tomography) scan contour modeling with minimized gradient to the image was
image is suitable for lung cancer diagnosis. This paper is to used. In another study [4], Autoenhancement, Gabor filter and
implement feature extraction and classification of lung cancer Fast Fourier transform (FFT) were used to enhance the image
nodule using image processing techniques. To implement the and used Thresholding and Watershed segmentation to
algorithm, MATLAB software is developed. This technique can segment the image. While for feature extraction, Binarization
help radiologists and doctors to know the condition of diseases at and Masking approach were applied. N.A. Memon et. al [5]
early stages and to avoid serious disease stages for lung cancer proposed thresholding method which select the threshold

patients. based on the object and background pixel means. Region
Keywords— Median filter,Otsu's thresholding, GLCM, ANN,
growing is used then to extract the exact cavity region with
MATLAB accuracy. In this paper presents image collection, image
preprocessing, image segmentation, feature extraction, and
classification of disease stages.
Mortality from lung cancer are expected to continue rising,
to become around 17 million worldwide in 2030. Early
detection of lung cancer can increase the chance of survival
among people. There are many techniques to diagnose the
lung cancer, such as Chest Radiograph (X-ray), Computed
Tomography (CT), Magnetic Resonance Imaging (MRI scan)
and Sputum Cytology. However, most of these techniques are
expensive and time consuming. Therefore, there is a great
need for a new technology to diagnose the lung cancer in its
early stages. Image processing techniques provide a good
quality tool for improving the manual analysis. Fig. 1. Anatomy of lung [2]
The lungs are a pair of sponge-like, cone-shaped organs [1].
The right lung has three lobes, and is larger than the left lung,
which has two lobes. Anatomy of lung is shown in Fig.1.
Lung cancer is a disease of abnormal cells multiplying and
growing into a nodule. Fig.2 describes the beginning of the
cancer. The types of lung cancer are divided into four stages.
In stage I, the cancer is confined to the lung. In stages II and
III, the cancer is confined to the chest (with larger and more
invasive tumors classified as stage III).Stage IV cancer has
spread from the chest to other parts of the body. Fig.2. The beginning of cancer [2]
In previous technical literature done by A.Amutha and
R.S.D Wahidabanu [3], Level Set-Active Contour Modeling

IJERTV3IS031882 2204

Therefore. Computed tomography (CT) images have better clarity. Dimensions of images are 512x512 pixels in size. which is the standard for storage and transfer of medical images [6]. 3 Issue 3. It is divided into two approaches: global thresholding and local thresholding . fast processing speed and ease in manipulation. i. Mean Median filter filtering is a simple. If the neighborhood under consideration contains an RT Calculated features even number of pixels. The median filter is normally used to reduce salt-and-pepper Otsu's thresholding noise in an image. It often does a better job than the other filters of preserving useful detail in the image.4. So. Median filter is used to remove the noise of images. Fig.ijert. Fig. image preprocessing stage is needed to eliminate 2205 . Fig. The medical data is usually in DICOM format. March .2014 II. The lung CT images are collected from Mandalay General Hospital in Myanmar. The median filter is a non-linear tool.3 illustrates a block diagram of lung cancer nodules feature extraction and classification of this system. This filtered image is used as the input for image segmentation. the more eliminate the noise. Fig. In general. Image preprocessing is a way to improve the quality of image. low noise and distortion for lung diagnosis.Original CT lung image with nodule III. reducing the amount of intensity variation between one pixel and the next.The final step is the classification of disease stages using neural network. 3x3 mask size of filter is mostly used. Median filter Median filter is one of the filter methods of image pre- processing. Thresholding method has the advantages of smaller storage space. 3. the system presents five basic steps. The output Classification of median filtered image is shown in Fig. stage3 and stage4) from the available database. The median is calculated by first sorting all the pixel values from the surrounding neighborhood into numerical order and then replacing the pixel being considered with the middle pixel value [7]. C. Collection Image EASE OF USE B. so that the filtered image is better than the original one. Image Collection The foremost step in medical image processing is collection of images. the mask size of filter is 15x15 because the IJE larger the mask size. intuitive and easy to implement of smoothing images.4. Fig. Otsu's thresholding Thresholding is one of the most powerful tools for image segmentation. The input of CT image contains noises such as white noise. Output of median filtered image salt and pepper noises etc. The third step is image segmentation which uses Otsu's thresholding method and the fourth step contains the calculation of feature extraction . compared with gray level image which usually contains 256 levels [8]. CT scan of lung images are given as input for this system. shows the original CT lung image with nodule. than other filters. The first step starts with taking a gathering of CT images (stage1. the average of the two middle pixel values is used.5. International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. Otsu‟s thresholding method is one of the global thresholding Otsu‟s thresholding is a non-linear operation and converts a gray- IJERTV3IS031882 www. The second step applies median filter for image pre-processing to get best level of quality and clearness. METHODS In this paper. stage2. In this paper.Block diagram of Lung Cancer Nodule Feature Extraction and Classification System A.

A structuring element is a shape characterize the appearance of an object. Output of Morphological operation for abnormal lung 0. 0. Set B is also referred to as the dilation mask or suggested minimizing the weighted sum of within-class structuring element (STREL).t) 2. perimeter and IJERTV3IS031882 www. erosion is further used ni= number of pixels in level i to get the output of segmented lung nodule shown in Fig. Feature extraction After the segmentation is performed.t) 2 + 1(1 . Morphology is a technique of image Shape measurements are physical dimensional measures that processing based on shapes. In this paper. the segmented lung nodule is used for feature extraction. International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. (5) b2 = 0 1(1 . Frequency and Mean value are the following equations to be calculated. Erosion produces an opposite effect of variances is equivalent to maximization of between-class dilation. Otsu‟s thresholding method is based on selecting the lowest point between two classes (peaks). Recall that minimization of within-class a binary image. The features like are needed to obtain individual lung and to remove geometric and intensity-based statistical features are extracted.y) =  ^  0 f(x. Derived from this method. Only these features were considered to be extracted. Fig.6. A feature is a significant Fig. A at all times.y) =input image dilation of A by B is done by reflecting B and then shifting B T = threshold value over A by 2206 .y) ≤ T A  B   z | ( B) z  A      (7) where.8. area. and defined by: g(x. 1. it removes the unwanted parts of    P(i) .y)=output image Where.This paper focuses on region of Where. P(i )  (2) i 0 N background shown in Fig. Mean: T RT    iP (i) /  (3) i 0 The variation of the mean values for each class from the IJE overall intensity mean of all pixels: between-classes variance b2 . Output of Otsu‟s thresholding image piece of information extracted from an image which provides After segmentation of images. which gives the It is based on threshold range by statistical.7.2014 scale image into a binary image where the two levels are mask used in the basic morphological operations [11]. In other words. g(x. the Otsu's threshold value is 0. Dilation is an operation that „grows‟ or „thickens‟ objects in a binary image.ijert. erosion of A by B is set of all points variance [10]. morphological operations more detailed understanding of the image. (4) Substituting t = 0 0 + 11. B^ is the reflection of B. Fig. This definition means that f(x. Then all the displacements of B are set such that B and A overlap by at least one element.6 shows the output of Otsu's thresholding image.0)2 (6) Fig. N=total pixel number interest (ROI) is only lung nodule. So. The assigned to pixels that are below or above the specified basic morphological operations are dilation and erosion.498. respectively. b2 = 0 (0 . Otsu dilation.  AB  z | ( B)  AC   z  (8) Frequency: Applying morphological operations on the output of T ni Otsu's thresholding image. 3 Issue 3. variances of the object and background pixels to establish an Erosion is an operation that „shrinks‟ or „thins‟ objects in optimum threshold.7. Output of segmented lung nodule D. 1 stands for the frequencies and mean values of two classes. March .8. threshold value represents between 0 and 1 and the segment of image will be achieved. Threshold value based on this method is traversed by center of B such that B is totally contained within between 0 and 1. The formal definition of dilation of a set A 1 f(x. unnecessary parts.y)>T (1) by another set B is denoted A⊕B. threshold value[9].

The input layers consider eight Correlatio n  features from the feature extraction step. s={s1. Statistical parameters calculated from GLCM values are as follows: Networks Inputs Networks Output 4) Entropy: the statistical measure of randomness that can be used to characterize the texture of the input image. values and in a specified spatial relationship occur in an image. j)log p(i. feed-forward Where.ANN spatial dependence matrix. eight features are extracted from the segmented image in Fig. Architecture of a general ANN pixel for the whole image. It Input layer Hidden layer Output Layer calculates intensity contrast between a pixel and its neighbor Fig. n 1 P  sn s1   si si 1 (10) i 1 where. forward from the input nodes. A Classification is the final step of determination of disease RT gray level co-occurrence matrix is a second order statistical stages to have lung cancer nodule or not of the patient lung. It is also known as uniformity or the angular second than other transfer function for this research. Architecture of image by calculating how often pairs of pixel with specific a general ANN is shown in Fig. p is the number of gray-level co-occurrence matrices in GLCM . After calculating of physical dimensional measure. as follows: Energy=   (p(i. creating a GLCM. and then extracting statistical measures from this matrix[15]. then Fig.9. n{ } represents the count of number of the pattern i. The GLCM functions characterize the texture of an are input layer. The physical dimensional measures are defined moment. j) (11) Where. P(i. The output layers  i j (13) contain four stages.ijert. measure introduced by Haralick [14]. ANN has three layers. They image.10. The Gray-Level Co-occurrence is collections of mathematical models that emulate the real IJE Matrix (GLCM) is based on the extraction of a gray-scale neural structure of the brain[16].10. The feed-forward neural networks are the simplest type of artificial neural networks devised. j ) A=n{1} (9) Homogeneity   (15) where. International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 7) Energy: Provides the sum of squared elements in the Transfer function is used log-sigmoid which is more suitable GLCM. 5) Contrast: Measures the local variations in the GLCM. the specified pixel pairs. In this paper. March . Classification method. Contrast =   (i-j)2 p(i. In this paper. The hidden layers present 20 layers. 3 Issue 2207 .  p(i. the features information moves in only one direction. obtained [12].9. Contrast is 0 for a constant image.s n} is a set of the boundary points 3) Eccentricity: The eccentricity is the ratio of the distance between the foci of the ellipse and its major axis length. Entropy = . j 1 i  j within the curly brackets. through the hidden nodes (if G 1 G 1  (i   )( j  j )p(i. hidden layer and output layer.j) neural network is used. of the distances between every consecutive boundary points [13]. j) (12) bias and activation functions.Extracted Feature Values from the Lung Nodule Image texture feature extraction is also calculated on the quantized image by using Gray level co-occurrence matrix (GLCM) E. Perimeter P is measured as the sum stages. In this 6) Correlation: Measures the joint probability occurrence of network.j) = pixel at location (i. j) i 0 j 0 i any) and to the output nodes. Mathematically. one of the most known texture analysis method. The train images IJERTV3IS031882 www. also known as the gray level methods commonly used in image processing techniques. j) )2 (14) 1) Area: The area is obtained by the summation of areas of 8) Homogeneity: Measures the closeness of the distribution of pixel in the image that is registered as 1 in the binary image elements in the GLCM to the GLCM diagonal.These information of the feature 2) Perimeter: The perimeter [length] is the number of pixels in extraction is used as the input to classify the lung cancer the boundary of the object. The common terminologies used in ANN include weight. The value is between 0 and 1.2014 eccentricity. GLCM is the gray-level Artificial neural network (ANN) is one of the classification co-occurrence matrix (GLCM).…. P(i.

01967 0.9997 After creating and training the neural from the train file including the known lung CT images and the test file including the unknown lung CT images.9225 Entropy 0.0092379 0.9271 0. So. the median filter is chosen for this research.12.13.13. By doing morphological operations.9286 0.2843 77.1.5269 109. The extracted features for train images describe in Table.7909 0. the training program stops automatically. In segmentation step.0165 Correlation 0. morphological operation is applied to get individual lung and to eliminate unnecessary Fig. DICUSSIONS In the preprocessing step. is the training of neural network.Mean Squared Error Vs Epoch Graph square error reaches to zero or the training time reaches to the defined epochs.ijert. The tested image and a result box appear as shown in Fig.014346 0.0101 0.9999 0. For feature extraction step. When the mean Fig. Fig.0056 0.The identification result obtains using the neural network approach the success of its efficient use lung cancer detection system.2014 contain four images for stage 1.9950 Homogeneity 0.0131 0. Median filter is more suitable than other filters for this research because the main advantage of median filtering is that even after pixel intensity values are changed the edges of the images are preserved.9998 0.0122 Eccentricity 0.01 2208 . March . The elapsed time for training this program is 0. (a) Tested Image.498 for this paper. it gets not only the individual lung but also apparent the lung nodule.Neural Network Training parts using erosion and dilation.7270 0. the median filter is used to remove noise from the lung CT original image.9983 0.11.023641 Contrast 0. stage2. International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. After segmentation of image. stage3 and stage 4.9960 0.9998 0. Otsu's thresholding converts a gray scale image to binary image which threshold value is 0. RT IJE Fig. Therefore. The graph of mean squared error (mse) and epochs is shown in Fig.9207 0. The increasing mask size is more effective in minimizing the impact of noise. TABLE I. For classification step. 3 Issue 3.(b)Result Box III. features are calculated from their formulas and they are used for classifying the disease stages of the lung nodule.11.12. After IJERTV3IS031882 www. And then the training process starts. feed-forward neural network is used. Otsu's thresholding method is straightforward than other segmentation methods.6897 0. mean square error and epochs graph can be seen.9972 0.9275 Energy 0. This value is autothreshold value using 'graythresh' command in MATLAB. The test features data sets consist of ten images.5980 94. INPUTS OF EXTRACTED FEATURES FOR TRAINING Stage1 Stage2 Stage3 Stage4 Area 206 341 491 608 Perimeter 54.

Leonardo Electronic Journal of Practices and Technologies.William K. 2006. providing his valuable advice. Pattern Recognition Letters. This technique helps the radiologists and the doctors by providing more information IJERTV3IS031882 www. the image has negative result and the Cancer Institute (NCI) and Patients Living with Cancer (PLWC). ACKNOWLEDGEMENT Finally. Statistical and structural approaches to feed-forward neural network is used to classify the lung cancer texture”. “Lung Cancer Detection Using Image Processing Techniques”.R. Vol. The First and foremost.R2014a Matlab. The author sincerely wishes to thank to the Head TP  TN Correctcla ssificatio n  x100% (16) of Department of Electronic Engineering. December 2012 all other types of cancer. Volume-1. Cai W. Aung Soe Khaing.. Upper Saddle River.G.. Gao W. TP=True Positive. By using these steps.786-804.634 diagnosis result. June 2012.” World Academy of ISBN (10): 0.. appropriate to improve the correct classify of the disease stages.ijert. window selection for uneven lighting image".C. 3 Issue 3. the image has negative result and the REFERENCES diagnosis system does not have the disease. "Automated cancer diagnosis based on histopathological images: a systematic survey". Issue-2. The extracted features are Rensselaer Polytechnic Institute. 801-808. pp.M. Inc. diagnosis system does not have the disease. [14] Haralick(1979). 2008. cancer distribution in the humanity.”R. and L.. FP image is found 1 time and FN is Processing".org. "Medical Image Segmentation by Multilevel MATLAB in Fig.90. the diagnosis system has the disease.O. [10] Huang Q. New York.A.Active Contour Modelling”. image preprocessing [12] Pratt. The 20 time with accuracy.M. The author is very thankful to all her teachers from Department of FP  FN Electronic Engineering at Mandalay Technological University. offers correct classification of lung disease stage is 90% and [8] Gonzalez R. Pérez M. "Thresholding technique with adaptive then the chance of surviving the patient increases.katemacintyrefoundation. If the lung cancer is detected in its earlier stage.. so it can play a very important and essential role Vision".. Woods R. NJ Prentice Hall. and some features are extracted. "Digital Image Processing using MATLAB". this method is not expensive hidden layers and log-sigmoid transfer function are and few time consuming. John Wiley & and image segmentation are implemented to obtain the Sons. S Esakkirajan and T Veerakumar. five [6] Digital Imaging and Communications in Medicine (DICOM) Part 1: images are true positive (TP) images and the rest images are Introduction and Overview." Lung Nodule Detection in CT Images IV. Adapted from National FP=False Positive.67.1992 [16] Mark Hudson Beale.p. the image has positive result and the www. pp. calculated for classification of disease stages. “A Novel Method for Lung Tumor Diagnosis and Segmentation using Level Set. Hagan. Incorrectclassificat ion  x100% (17) TP  TN  FP  FN Last.3rd edition. Therefore. National Electrical Manufacturers true negative (TN). Tata McGraw Hill.. this system 07-014479-8.. for classification and the incorrect classification of this system.147-158. [online available].1. [online available]. [2] Anatomy of lung picture and beginning of cancer.G.. This system can know the condition of lung cancer at [15] Haralick.Addison-Wesley. the author would like to thank her following equations in (16) and (17) are to evaluate the correct supervisor. helpful guidance and knowledge. Proceedings of the IEEE. March . "Digital Image be exactly correct classify.vol.. Associate Professor Dr. In classification.The test images are 10 images. incorrect classification is 10% shown in command window of [9] Nunes É. the author appreciates the help from Mandalay General Hospital for supporting a large collection of lung CT images Where. stages. the test set of unknown categories of lung CT images and taking correct decision for lung cancer patient in short is passed through the ANN classification system. Jayashree..Inc.Hla Myo Tun at Mandalay Technological University. and accessed on 13 July 2013.allreferhealth. the image has positive result and which have been valuable for this research.14. Associate Professor TP  TN  FP  FN Dr.. November 2012. “Segmentation of Lungs from CT Scan Images In this research.. p.Amutha and R. Therefore. Signals and Image Processing. ISSN Fig. Engineering and Technology. Howard B. [11] Sudha. In this paper. TP images can Association.E. Memon et. 2007. pp. International The mortality rate of lung cancer is the maximum among Journal of Emerging Science and Engineering (IJESE) ISSN: 2319– 6378. Technical report. No.D Wahidabanu. European Journal of Scientific Research.. TN=True Negative. January.V. for Early Diagnosis of Lung Cancer. Department of Computer Science. the train images are four images (stage 1. "Digital Image Processing".Performance Evaluation 1583-1078 IJE [5] N.P.ISBN(13):978-0- not found for incorrect classification. of doctor together increase the accuracy of detecting lung 2010 cancer nodule.D.When the testing of images. TR-05-09. MathWorks. CONCLUSIONS Using Thresholding and Morphological Operations".Shapiro.2. accessed on 4 June 2013. the performance of the system is evaluated.2014 training.. Martin T. the nodules are detected [13] Cigdem Demir and B¨ulent Yener. FN=True Negative. 11.2010. The result of this system and the analysis Thresholding Based on Histogram Difference". 2006. diagnosis system has the disease. International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol.14. "Computer and Robot early 2209 . p.vol. [3] A. Elsevier.S.07014479-6. al. Issue 20. [1] Non-Small Cell Lung Cancer. Al-Tarawneh. http://www.1991.p-459. Demuth.175-187 RT [4] Mokhled S. stage 2. stage3 and stage4). presented at 17th International Conference on Systems. [7] S Jayaraman. "Neural to avoid serious stages and to reduce the percent of lung Network Toolbox™ User‟s Guide" .

Mandalay.He Technology (B. bioinstrumentation and telemedicine. International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 2210 . he was doing analysis. Dr. texture September 2010. Germany. Myanmar. Myanmar on 15.She received her Bachelor of Division. March . Engineering in Electronics from She is now Master of Engineering Yangon Technological University.E) thesis student in Mandalay Yangon. and computer vision. Mandalay RT Technological University. He has Photo Technological University.ijert. 3 Issue 3.2014 BIOGRAPHIES Khin Mya Mya Tun was born in Aung Soe Khaing was born in Mandalay. Author‟s (M. Myanmar. in 2011.Tech) degree in 2011 received Bachelor of Engineering in and Bachelor of Engineering (B. in 2004 and Master of Technological University. Aung Soe Khaing was responsible for the ECG laboratory for the biomedical engineering students at the Institute of Biomedical Engineering and Informatics. He is now Associate Professor at Department of Electronic Engineering. Myanmar. Technical University Ilmenau. Mandalay. He received his PhD in Electronic Engineering from Mandalay Technological University. continued his PhD dissertation in Her research interests include bio. Technical University Ilmenau. in 2006.E) Electronics from Mandalay degree in 2012 in Electronics Technological University. 2006. research on Spatial Frequency Analysis of the Human Brain at the Institute of Biomedical Engineering and Informatics. Myanmar. Mandalay 1990. Myanmar on 27. From October 2008 to medical image processing. Germany from October 2008 to September 2010.1982. biomedical signal and image processing. Myanmar. Pyawbwe Township. Engineering from Mandalay Myanmar. His research interests include computer based Electrocardiogram IJE (ECG) system. IJERTV3IS031882 www.3.