You are on page 1of 6

FEATURE SELECTION USING MODIFIED

PARTICLE SWARM OPTIMIZATION


FOR TUBERCULOSIS DETECTION
Mrs.R.Beulah Jeyavathana DR.R.Balasubramanian M.Celin Supriya
(Asst Professor) ( HOD) (PG Student)
Department of CSE Department of CSE Department of CSE
Mepco Schlenk Engineering Manonmaniam sundaranar University Mepco Schlenk Engineering College,
College,Sivakasi, India Tirunelveli, India Sivakasi,Indias

Abstract - Tuberculosis is one of the dangerous infectious disease million children became ill with TB and 170000 children
that can be characterized by the growth of tubercles in the tissues. died of TB (excluding children with HIV).
This disease mainly affects the lungs and also the other parts of our Even though several effective methods have been
body. The disease can be easily diagnosed by the radiologists. The used to reduce the effect of TB, but it is a third high rated
main objective of this paper is to get best solution selected by
disease causing death every year since just X- rays are used
means of modified Particle Swarm Optimization is regarded as
optimal feature descriptor. Five stages are being used to detect for detection process. X-ray is not easily predicting the early
tuberculosis disease. They are pre-processing an image, segmenting stage of tuberculosis. Manually the detection of TB is done by
the lungs and Extracting the feature, Feature Selection and just looking at the X-rays by the doctors/technicians. So by
Classification. These stages that are used in medical image means of looking at the images by the naked eye there is more
processing to identify the tuberculosis. In the feature extraction, the chance for wrong prediction of the intensity of the
GLCM approach is used to extract the features and from the tuberculosis. Hence, because of this wrong prediction of
extracted feature sets the optimal features are selected by modified tuberculosis, an automated detection of tuberculosis is used.
PSO technique. Finally, Support Vector Machine classifier method To overcome the problems in existing methods, CT lung
is used for image classification. The experimentation is done and
images are used for diagnosis of tuberculosis.
intermediate results are obtained. The proposed system accuracy
results are better than the existing method in classification. In image processing Feature extraction is an important step,
which is a special form of dimensionality reduction. When
Index Terms - Tuberculosis, Segmentation FCM clustering, the input data is too large to be processed and suspected to be
Feature extraction GLCM approach, Modified PSO technique, redundant then the data is transformed into a reduced set of
SVM classifier. feature representations. The transformation of the input data
into a feature set is called feature extraction. Feature contains
I. INTRODUCTION the information that is related to colour, shape, texture and
Tuberculosis is the infectious bacterial disease that is context. Here we used Texture feature extraction using
caused by the organism called Mycobacterium tuberculosis GLCM is done. Modified PSO technique is based on
and that may affect any tissues of the body but it mainly optimization searching technique and it is used to find the
affects the lungs. The TB is one of airborne pathogen and that optimal solutions. It is used for selecting the best features
can spread through air or by coughing or sneezing from one after the feature extraction process. This will continue until a
person to another. TB affects all age groups in all parts of the needed solution is obtained. Then classify the images
world. Mostly it affects young adults and also the peoples who whether it is normal or abnormal by SVM classifier.
are all in the developing countries. Active lung TB are cough II. RELATED WORKS
with sputum and blood at a time, weight loss, chest pains, Les Folio (2014), presented the automated approach for
fever, weakness and night sweats. Tuberculosis bacteria are detecting tuberculosis in conventional poster anterior chest
present in sputum samples are identified under a microscope. radiographs. Extract the lung region using a graph cut
It detects only half the number of Tuberculosis cases and segmentation method. For the extracted region, set of texture
cannot detect drug-resistance. features and shape features are computed, which enable the
In 2015, around 11 million people fell ill with TB X-rays to be classified as normal or abnormal using a binary
and 2 million peoples were died from the disease. Over 95% classifier. Measure the performance of the system on two
of the deaths in TB occur in low and middle-income datasets: a set collected by the tuberculosis control program of
countries. Around Six countries account for 60% of the total, our local county’s health department in the United States, and
with India leading the count, followed by Pakistan, China, a set collected by Shenzhen Hospital, China [1].
Indonesia, Nigeria and South Africa. An estimated one
C.Bhuvaneswari, Loganathan D (2014), proposed to detect of each decision variable while the Defuzzifier adopts the
the lung diseases by effective feature extraction through Centroid of Area (CoA) defuzzification technique to generate
moment invariants, feature selection through genetic a crisp output for a given diagnosis. Finally, SVM is used in
algorithm and the results are classified by Naïve Bayes and the classification stage [8]. A. Zabidi , L.Y. Khuan IEEE
decision tree classifiers. The pre-processing techniques is International conference(2011) proposed the Binary Particle
used to remove the noises and the feature extraction are done Swarm Optimization For Feature selection in Detection of
to extract the useful features in given image and the feature Infants with Hypothyroidism. In this, he investigates the
selection technique will optimize the top ranking features that effect of feature selection with Binary PSO on performance of
are relevant for the image and the classifiers are employed to Multilayer perceptron classifier in discriminating between the
classify the images and the performance measures are found healthy infants and infants with hypothyroidism from their
for the same [2]. Shuangfeng Dai, Ke Lu, Jiyang Dong (2015) cry signals. The performance was examined by varying the
provided a new method based on an improved graph cuts number of coefficients [9].
algorithm, and improve the graph cuts energy from both the
regional penalty item and the boundary penalty item for
III. SYSTEM DESIGN
detection of tuberculosis [3].Laurens Hogeweg, Clara
I.Sánchez (2015), in this paper textural, focal, and shape
abnormality subsystems are combined into one system to deal Input Image
with the heterogeneous abnormality expression in different Dataset
populations. The performance is evaluated on a TB screening
and a TB suspect database using both an external and a
radiological reference standard. The systems to detect Preprocessing
different types of TB related abnormalities and their
combination is described. The two databases that were used
for evaluation [4]. Wenbo Li, Yan Kang (2015) using a new
adaptive VOI selection method. Twenty-two features were Segmentation
extracted to distinguish nodules, vascular endpoint or
vascular cross structure, designed an optimal feature
combination selection frame based on improved genetic Feature Extraction
algorithm and support vector machine. The improved GA
algorithm to select the optimal feature combination from the
feature pool to establish SVM classifier [5]. G.Vijaya,
A.Suhasini identified the cancer tumor from lung CT images Feature Selection
using edge detection and boundary tracing. Feature
extraction, this stage is used to find the size of the tumor
based on the area, perimeter, and irregularity index. The final
stage is classification stage, this stage is to separate the tumor Classification
as benign (or) malignant. To classify the lung cancer, by
using the data mining, classification techniques like SMO
FIG 1:System Design
(Sequential Minimal Optimization), J48 decision tree, Naive
Bayes. Once the classification is performed, we have to
In this work, Fuzzy C-Means clustering method is used to
compare the experimental results of the above classification
segment the lungs from the background and Extract the
techniques, and determine which one gives accurate and
defected tissues from the lung tissues as ROIs by finding the
correct answers [6]. Elmar Rendon-Gonzalez (2016) proposed
range of pixel intensity values to differentiate the defected
the pre-processed step have several masks are calculated
and other lung pixels. If no defected tissues present, then the
using thresholding technique and morphological operations,
slice is considered to be normal. GLCM approach is used to
eliminating this way, background and surrounding tissue.
extract the features in an image. Extract the twenty two
The suspicious Regions of Interest (ROI) are calculated using
texture features for each ROI in four orientations
a priori information and Hounsfield Units (HU). In feature
0°,45°,90°,135° using gray level co- occurrence matrix
extraction, numerous features are calculated in order to
(GLCM) and Generate the feature vector of each ROI by
restrict the suspicious zones. Finally, Support Vector
combining the eighty eight features. Thus large number of
Machine (SVM) algorithm is employed in classification stage
features can be extracted. Modified PSO is used for best
[7]. Mumini Olatunji Omisore (2014) proposed the genetic
feature selection to reduce the number of features in a given
neuro-fuzzy inferential model for the diagnosis of
feature set. Finally, to find whether the image is normal or
tuberculosis. In this system, the Fuzzifier uses a triangular
abnormal lung image by SVM classifier.
membership function to determine the degree of contribution
IV. PROPOSED WORK calculate the cluster centers and other is the assignment of
points to these centers. In which the membership value is
4.1 MATERIAL AND METHODS
assigned to each data items in the cluster and it is in the
In the study, dataset containing lung CT images range of 0 and 1.
comprising abnormal lung and normal lung are taken from This algorithm is based on optimization of the objective
several patients was utilized. The lung diseases are categorized function is given by ,
by the radiologist from the CT Image. Images are collected
from male and female patients whose ages are ranging from 15
to 78 years.
4.2 PRE-PROCESSING (4.4)
Pre-processing is done to remove unwanted noise 4.3.2 ROI EXTRACTION
and it gives quality to the images at this stage where filtering is ROI’s are extracted using the radiologists and thus it is
done to remove noise. In our proposed system we have used validated to obtain the clinical relevance which improves the
wiener filter to remove noise. Wiener filter preserves the edges performance of the system. Extract the defected tissues from
and fine details of lungs. It is low pass-filter. The filter size of the lung as ROI’s and then find the intensity level of the
5*5 is selected to avoid over smoothing of the image. 2D pixels and using the range of pixel intensity values
Wiener filter is used for reduction of additive gaussian white differentiate the defected tissues and other lung tissues. If
noise in images. Wiener estimates the local mean and variance there is no defected tissues are present, then the slice is
around each pixel. considered to be Normal. Then obtain the class labels for each
ROI’s from the experts. Finally ROI’s are extracted from the
1
 
NM
 a(n , n
n1 , n2
1 2 )
(4.1)
lung tissues and also the class label information is obtained.

Input: I-Image I without noise.


C-number of clusters
1
a
Output: S-Segmented lungs
2  2
(n1 , n2 )   2 Initialize €;
NM n1 , n2
(4.2) Initialize maxIterations
where η is the N-by-M local neighbourhood of each pixel in the Choose Euclidean distance metric
image. Initialize m;
A Gaussian filter is a linear low pass filter. It's usually Randomly initialize V0=v1,v2…vc cluster centers;
used to reduce edge blurring or to reduce noise. Gaussian for t=1 to maxIterations do
smoothing is very effective for removing Gaussian noise. The Update the membership matrix U
weights give higher significance to pixels near the edge. The Calculate the new cluster centers Vt
idea of Gaussian smoothing is to use this 2-D distribution as a Calculate the new objective function
point-spread function, it is given by, if(abs(J mt  J mt 1 )<€) then
break;
else
J mt  J mt 1
(4.3)
where σ is the standard deviation with zero mean. end if
4.3 SEGMENTATION: end for
4.3.1 Fuzzy c-Means clustering: end
Fuzzy C-Means is a method of clustering. It is one of
the Soft clustering approach. In many situations, fuzzy Fuzzy C-Means ALGORITHM
clustering is more natural than hard clustering and is a
powerful unsupervised method for the analysis of data and 3.4 FEATURE EXTRACTION:
also in the construction of models. In the case of Traditional 34.1 GLCM Approach:
clustering approaches generate partitions in which each The feature extraction based on Texture feature is carried
pattern belongs to one and only one cluster. Hence the out. GLCM approach is used for extracting the features in
clusters that are in a hard partition are disjoints. But Fuzzy given image such as entropy, energy, contrast, correlation,
clustering extends this to associate each pattern to every variance, sum average, homogeneity cluster shade and etc..,
cluster using the membership function. are considered for feature selection. Extract these twenty two
Fuzzy C-Means clustering algorithm was initially features for each ROI in four orientations 0o, 45o, 90o, 135o
developed by Joe Dunn in 1973 and then it is improved by using GLCM also called the Grey Tone Spatial Dependency
Jim Bezdek in 1981.It consists of two main steps, one is to Matrix. How often different combinations of pixel brightness
values (grey levels) occur combinations of pixel brightness examine that the region is an affected area and can be
values (grey levels) occur in an image. GLCM contains the considered as tuberculosis.
information about the positions of pixel having similar grey
level values. 3.5 FEATURE SELECTION:
Energy: It is the sum of squared elements in the GLCM and it The term Feature selection deals with selecting a subset of
is by default one for constant image. features, among the entire features, that shows the best
performance in classification accuracy. The best subset
Energy   (i, j ) 2 (4.5) contains the least number of dimensions that most contribute
i, j
to accuracy. Optimization searching process is done by the
Entropy: It is a statistical measure which describes the Modified Particle Swarm Optimization.
randomness of an image. Means if the value of the entropy is 3.5.1 Modified Particle Swarm Optimization:
low it means there is less pixel contrast variations and high The Original PSO is a population based stochastic
value indicates that the there is greater difference between optimization technique developed by Dr. Eberhart and
two pixel values. Dr.Kennedy in 1995 in which each and every candidate
solutions are called the particles. In this proposed approach, a
Entropy   ( p. * log 2( p ))
(4.6) fitness function is used in each step of this algorithm. It is
Mean: It calculates the mean of all the values in an image. initialized with a group of random particles (solutions) then
it searches for the optimal solution by updating generations.
M N
p(i, j ) In every iteration, each particle is updated by following two
Mean,   
i 1 j 1 MN "best" values. The first one is the fitness value, this value is
(4.7)
called pbest. Another "best" value that is tracked by the
Standard Deviation: It calculates the average distance
particle swarm optimizer is the best value, obtained by any
between pixels values and the mean.
particle in the population. This best value is a global best and
called gbest. When a particle takes part of the population as
M N
( p (i, j )   ) 2
   MN
its topological neighbors, the best value is a local best and is
i 1 j 1
called lbest. Initially, the particles’ velocities are set to zero &
(4.8)
their position is randomly set within the boundaries of the
Skewness: It gives the asymmetric value of the probability
search space.Then calculate the velocity of the particle with
distribution about its mean. The skewness of an image can be
their position using the following equations (4.13) and (4.14).
positive, negative.
And then update the position with updated velocity.
M N
[ p (i, j )   ]3
SK  
i 1 j 1 ( MN ) 3
(4.9)
v[] = v[] + c1 * rand() * (pbest[] - present[]) + c2 * rand() *
(gbest[] - present[]) (4.13)
Homogeneity: It gives the distribution value of the closeness
of elements of the GLCM to the diagonal of GLCM. It gives
present[] = persent[] + v[] (4.14)
the value between the range of 0 and 1.
p(i, j )
Homogeneity   v[] is the particle velocity, present[] is the current particle .
i, j 1 i  j pbest[] and gbest[] are defined as stated before. rand () is a
(4.10) random number between (0,1). c1, c2 are learning factors
Contrast: It measures intensity between a pixel and its Here the particle was updated in each iteration with their
neighbour over the whole image and it is considered zero for position and velocity for getting the optimized results. The
constant image and it is also known as variance and moment algorithm for the Modified PSO is given below,
of inertia.
contrast   (i  j ) 2 p (i, j ) P = Particle_Initialization();
i, j (4.11) for i = 1 to it_max
Correlation: It measures how pixel is correlated to its for each particle p in do
neighbour over the whole image. fp = f(p);
If fp is better than f(pBest)
(i  i ) p(i, j ) pBest = p;
correl   end
i, j  i j end
(4.12)
Totally eighty eight features are extracted in an image. These while generation< maxGenerations do
88 features describe the characteristics of the segmented compute speed();
region. On the basis of these features, it can be clearly update position();
mutation();
evaluation(;)
update particle memory();
generation++;
end
gBest = best in p in P;
for each particle p in do
V = v + cl*rand*( pBest - p) + c2*rand*(gBest - p);
P = p + v;
end
end

3.6 Classification Subsystem:


3.6.1 Support Vector Machine Classifier: In many practical
problems, Support Vector Machine (SVM) plays a major role
in classifying high accuracy and it has the good capabilities of Fig 5.1 shows the processing time analysis of various modules
fault-tolerance and generalization. Because of its greater such as Preprocessing, Segmentation and the Feature
generalization performance support vector machine has Extraction using the proposed and the existing approach.
drawn much attention and been applied successfully in recent
years. Rather than minimizing the mean square error over the
data set. It aims at minimizing a bound on the generalization
error of a model. Here, we get 92.30 % accuracy using SVM.

V. RESULTS AND DISCUSSIONS


The CT images used for testing and training purpose for
classification were collected from AARTHI SCANS & LABS
at TIRUNELVELI. We have several CT images, but we use (a) (b)
100 images for my work out of which 50 images have
tuberculosis and the remaining images do not have
tuberculosis. The segmentation of the image takes place, in
which FCM is done. The set of Tuberculosis (TB) CT images
and non-Tuberculosis CT images are tested to give an
accurate result. Thus the technique deals with the accurate
detection of tuberculosis .
5.1 Processing time analysis
In our study, to implement our proposed algorithm,
we used MATLAB software (R2016a) on a laptop, Intel Core (c) (d)
i3 (2.0 GHZ) and 4GB memory. The resolution of images in
our database was 512 x 512. To evaluate our proposed
algorithm efficiently, we analyzed each step of our algorithm
based on processing time. Table 1 represents the average
processing time (PT) of each module in the proposed
algorithm..

Module Processing Time Processing Time


time(s) using PSO using MPSO
Preprocessing 1.3974seconds 1.0972seconds
(e)
FCM 9.0871seconds 8.9789seconds (a)-Input (b)-Gaussian Filter (c)-Weiner filter
Segmentation (d)-Segmented lungs (e)-ROI extraction
Feature 0.9876seconds 0.5432seconds
Extraction Performance Analysis

Accuracy = (T P + T N) / ( T P + T N + F P + F N )
Specif icity = TN / ( T N + F P )
Sensitivity = TP / ( T P + F N )
where TP is the number of actual true positives identified by TRANSACTIONS ON MEDICAL IMAGING, VOL.33, NO.2,
FEBRUARY2014.
the system, FP is the number of actual negatives labeled as [2]C.Bhuvaneswari, P.Aruna, D.Loganathan, “Classification of Lung Diseases
positives, TN is the number of actual true negatives and FN is by Image Processing Techniques Using Computed Tomography Images”
the number of actual positives missed and labeled by the International Journal of Advanced Computer Research (ISSN (print): 2249-
system as negative. Table 2 represents the classification 7277 ISSN (online): 2277-7970) Volume-4 Number-1 Issue-14 March-2014.
performance for normal and abnormal lungs for 100 samples. [3]Shuangfeng Dai, Ke Lu, Jiyang Dong “Lung segmentation with improved
graph cuts on chest CT images” 2015 3rd IAPR Asian Conference on Pattern
Recognition.
Parameter Applying MPSO Applying PSO
[4]Laurens Hogeweg*, Clara I. Sánchez, Pragnya Maduskar, Rick Philipsen,
SVM KNN MLP SVM KNN MLP Alistair Story, “ Automatic Detection of Tuberculosis in Chest Radiographs
TP 47 46 43 43 43 40 Using a Combination of Textural, Focal and Shape Abnormality Analysis” IEEE
TN 46 45 45 40 41 42 TRANSACTIONSONMEDICAL IMAGING,VOL.34,NO.12,
FP 4 5 5 10 9 8 DECEMBER2015.
FN 3 4 7 7 7 10 [5]Shenshen Sun, Wenbo Li , Yan Kang “Lung Nodule Detection Based on GA
and SVM” 2015 8th International Conference on Bio Medical Engineering and
Accuracy 93 91 88 83 84 82
Informatics (BMEI 2015).
Sensitivity 94 92 86 86 86 80 [6]G. Vijaya, A. Suhasini, R. Priya “AUTOMATIC DETECTION OF LUNG
Specificity 92 90 90 80 82 84 CANCER IN CT IMAGES” IJRET: International Journal of Research in
Engineering and Technology
[7]Elmar Rendon-Gonzalez and Volodymyr Ponomaryov “Automatic Lung
Nodule Segmentation and Classification in CT Images Based on SVM”
International conferences on 2016 IEEE.
[8] Mumini Olatunji Omisore proposed the genetic neuro-fuzzy inferential
model for the diagnosis of tuberculosis.IEEE transactions on 2014.
[9] Girisha A , M C Chandrashekhar, Dr. M Z Kurian, “Texture Feature
Extraction of Video Frames Using GLCM“ International Journal of Engineering
Trends and Technology (IJETT) – Volume 4 Issue 6- June 2013.
[10]Sema Candemir*, Stefan Jaeger, Kannappan Palaniappan, Jonathan P.
Musco, “Lung Segmentation in Chest Radiographs Using Anatomical Atlases
With Non-rigid Registration” IEEE TRANSACTIONSON MEDICAL
IMAGING, VOL.33, NO.2, FEBRUARY 2014.
[11]Marius George Linguraru*, William J. Richbourg, Jianfei Liu, Jeremy M.
Watt, “Tumor Burden Analysis on Computed Tomography by Automated Liver
and Tumor Segmentation” IEEE.
Fig 5.2 shows the Accuracy analysis of various Classifiers
using the proposed and the existing approach.
VI. CONCLUSION
In this work the pre-processing of the images are
done, then the segmentation are done by FCM clustering
algorithm. FCM algorithm is one of the distinctive clustering
algorithms. Furthermore several algorithms are developed
based on FCM. During the implementation of this algorithm,
it achieve more accuracy and Gray Level Co-occurance
Matrix-based feature extraction technique was described. The
texture features are served as the input to classify the image
accurately. Effective use of these multiple features and the
selection of suitable classification method is significant for
improving accuracy.
VII. FUTURE WORK

In this work, we used Support Vector Machine for


Classification. In future, genetic and fuzzy algorithm can be
used to optimize the classifier since it increases the
possibilities of reaching the global optimal solution.

VIII. REFERENCES
[1]Stefan Jaeger, Alexandros Karargyris, Sema Candemir, Les Folio, Jenifer
Siegelman, “Automatic Tuberculosis Screening Using Chest Radiographs” IEEE

You might also like