You are on page 1of 14

Noname manuscript No.

(will be inserted by the editor)

Automatic Detection of Tuberculosis based on Adaboost


Classifier and Genetic Algorithm

Mrs.R.Beaulah Jeyavathana1 ·
Dr.R.Balasubramanian2

the date of receipt and acceptance should be inserted later

Abstract

Tuberculosis is one of the most commonly affected diseases in the progressing countries. Early stage
diagnosis of tuberculosis plays a significant role in curing TB patients. The work presented in this paper
is focused on design and development of a system for the detection of tuberculosis in CT lung images. The
disease can be diagnosed easily by radiologists with the help of automated tuberculosis detection system.
The main objective of this paper is to get best solution selected by means of genetic programming is
regarded as optimal feature descriptor. Five stages are being used to detect tuberculosis disease. They
are Pre-processing the image, Segmentation, Extracting the feature, Feature Selection and Classification.
These stages are used in medical image processing to enhance the TB identification. In feature extraction
stage, wavelet based statistical texture feature extraction is used to extract the features and from the
extracted feature sets the optimal features are selected by Genetic algorithm. Finally, Adaboost classifier
method is used for image classification. The experimentation is done and intermediate results are obtained.
Experimental results show that Adaboost is a good classifier, giving an accuracy of 93% for classifying
TB Affected and Non-Affected lungs using wavelet based statistical texture features.

Keywords Tuberculosis, Otsu method, GLCM approach, Genetic Algorithm, Adaboost classifier.

Biographical Notes
Beaulah Jeyavathana R received her M.E degree in Computer Science and Engineering from
Sathyabama Deemed University, Chennai, India, and B.E degree in Computer Science and
Engineering from Dr.Sivanthi Adithanar College of Engineering, Tiruchendur, India. Currently, she
is pursuing PhD from Manonmaniam Sundaranar University, Tirunelveli, India. She is currently
working as an Assistant professor (Sr.Grade) in Mepco Schlenk College of Engineering, Sivakasi,
India. She has 10 years of teaching and research experience. Her research interests include machine
learning, soft computing and Software Engineering. She has six international and one national
publication.
Balasubramanian R received his PhD degree in Computer Science and Engineering from
Manonmaniam Sundaranar University, Tirunelveli, India. He is currently working as Senior
Professor and Head in Manonmaniam Sundaranar University, Tirunelveli, India. He has 25 years of
1 AssistantProfessor (Sr.Grade), Department of CSE
Mepco Schlenk Engineering College, Sivakasi
India
Mob.: +91-8524091066
E-mail: rbeaulah@mepcoeng.ac.in
2 SeniorProfessor and Head, Department of CSE
Manonmaniam Sundaranar University, Tirunelveli
Phone No.: +9443695573
E-mail: rbalus662002@yahoo.com
2 Mrs.R.Beaulah Jeyavathana1 , Dr.R.Balasubramanian2

teaching and research experience. His research interests include Digital Image processing and Data
Mining. He has sixteen international and one national publication.

1 Introduction

Tuberculosis is one of the infectious diseases caused by micro bacterium and it most often affect
the lungs. The bacteria that cause TB are spread when an infected person coughs or sneezes. TB
affects all age persons in all parts of the world[1]. Active lung TB are cough with sputum and
blood at a time, weight loss, chest pains, fever, weakness and night sweats. Tuberculosis bacteria
are present in sputum samples are identified under a microscope. It detects only half the number
of Tuberculosis cases and cannot detect drug-resistance. In 2015, 10.4 million people fell ill with
TB and 1.8 million died from the disease. Over 95% of TB deaths occur in low and middle-income
countries. Six countries account for 60% of the total, with India leading the count, followed by
Pakistan, China, Indonesia, Nigeria and South Africa. An estimated one million children became
ill with TB and 170000 children died of TB (excluding children with HIV).
Even though several effective methods have been used to reduce the effect of TB, but it is a
third high rated disease causing death every year since just X- rays are used for detection process.
X-ray is not easily predicting the early stage of tuberculosis[2]. Manually the detection of TB is
done by just looking at the X-rays by the doctors/technicians. So by means of looking at the images
by the naked eye there is more chance for wrong prediction of the intensity of the tuberculosis.
Hence, because of this wrong prediction of tuberculosis, an automated detection of tuberculosis
have used. To overcome the problems in existing methods, CT lung images are used for diagnosis
of tuberculosis.
In previous years physicians diagnoses the disease with clinical data of the patients and with
laboratory tests reports. This method of diagnosis of the disease is time consuming due to the
physicians who are dealing with imprecise and uncertain clinical data of the patients. So, to improve
the decision making with clinical data and to reduce time consumption, an intelligent diagnosis
system is needed, so that it is possible to develop an accurate description or model for each class
using the attributes in the dataset. It is always a problem for physicians to find the disease more
accurately. The proposed method is implemented with Adaboost classifier for classification along
with Genetic Algorithm.
In image processing Feature extraction is an important step, which is a special form of
dimensionality reduction. When the input data is too large to be processed and suspected to
be redundant then the data is transformed into a reduced set of feature representations. The
transformation of the input data into a feature set is called feature extraction. Feature contains
the information that is related to colour, shape, texture and context.
To overcome the problems in existing methods, CT lung images are used for diagnose the
tuberculosis in automated ways based upon Genetic algorithm and Adaboost classifier[4]. Genetic
Algorithm[5] is based on optimization searching technique and it is used to find the optimal or
near-optimal solutions to difficult problems.It is based on the principle of Survival of the fittest.
GA is used for selecting the best features after the feature extraction process. Genetic algorithm
iteratively generates new generation of programs by applying naturally occurring genetic operators
to a population of computer programs. The genetic operations include reproduction, crossover and
mutation. This will continue until a needed solution is obtained. Adaboost classifier is used to
classify normal and abnormal images.
The rest of the paper is organized as follows: brief description of related work is presented in
Section 2, related work is presented in Section 3, proposed methodology is discussed in Section
4, results and discussion are present in Section 5, discussion and future directions are devoted to
Section 6.
Automatic Detection of Tuberculosis based on Adaboost Classifier and Genetic Algorithm 3

2 Related work
D.Zacharyetal[6] presented the automated approach for detecting tuberculosis in conventional
poster anterior chest radiographs. They extracted the lung region using a graph cut segmentation
method. For the extracted region, set of texture features and shape features are computed, which
enable the X-rays to be classied as normal or abnormal using a binary classier. Measure the
performance of the system on two datasets: a set collected by the tuberculosis control program
of our local countys health department in the United States, and a set collected by Shenzhen
Hospital, China.
Marius George Linguraru et al., [7] proposed automated computation of hepatic tumor burden
from abdominal computed tomography (CT) images of diseased populations with images with
inconsistent enhancement. The automated segmentation of livers is addressed first. A novel 3-D
affine invariant shape parameterization is employed to compare local shape across organs. By
generating a regular sampling of the organ’s surface, this parameterization can be effectively used
to compare features of a set of closed 3-D surfaces point-to-point, while avoiding common problems
with the parameterization of concave surfaces. From an initial segmentation of the livers, the areas
of a typical local shape are determined using training sets. A geodesic active contour corrects
locally the segmentations of the livers in abnormal images. Graph cuts segment the hepatic tumors
using shape and enhancement constraints. Liver segmentation errors are reduced significantly and
all tumors are detected. Finally, support vector machines and feature selection are employed to
reduce the number of false tumor detections. The tumor detection true position fraction of 100%
is achieved at 2.3 false positives/case and the tumor burden is estimated with 0.9% error.
S. Jaeger, A. Karargyris, S et al., [9] presented an automated approach for detecting tuberculosis
in conventional postero anterior chest radiographs. They have extracted the lung region using a
graph cut segmentation method. For this lung region, they computed a set of texture and shape
features, which enable the X-rays to be classified as normal or abnormal using a binary classifier.
Then they measured the performance of their system on two datasets: a set collected by the
tuberculosis control program of our local county’s health department in the United States, and
a set collected by Shenzhen Hospital, China. The proposed computer-aided diagnostic system for
TB screening, which is ready for field deployment, achieves a performance that approaches the
performance of human experts. We achieve an area under the ROC curve (AUC) of 87% (78.3%
accuracy) for the first set, and an AUC of 90% (84% accuracy) for the second set. For the first set,
they compared their system performance with the performance of radiologists. When trying not to
miss any positive cases, radiologists achieve an accuracy of about 82% on this set, and their false
positive rate is about half of their system’s rate.
M.Akhil jabbar et al., [10] described Nearest neighbor (KNN) algorithm for pattern recognition.
They proposed a new algorithm which combines KNN with genetic algorithm for effective
classification. Genetic algorithms perform global search in complex large and multimodal landscapes
and provide optimal solution. Experimental results shows that their algorithm enhance the accuracy
in diagnosis of heart disease.
C.Bhuvaneswari, Loganathan D [18], proposed to detect the lung diseases by effective feature
extraction through moment invariants, feature selection through genetic algorithm and the results
are classified by Naive Bayes and decision tree classifiers. The pre-processing techniques is used to
remove the noises and the feature extraction are done to extract the useful features in given image
and the feature selection technique will optimize the top ranking features that are relevant for the
image and the classifiers are employed to classify the images and the performance measures are
found for the same.
Shuangfeng Dai, Ke Lu, Jiyang Dong[?] provided a new method based on an improved graph
cuts algorithm, and improve the graph cuts energy from both the regional penalty item and the
boundary penalty item for detection of tuberculosis. Laurens Hogeweg, Clara I. Sanchez [?], in this
paper textural, focal, and shape abnormality subsystems are combined into one system to deal with
the heterogeneous abnormality expression in different populations. The performance is evaluated
4 Mrs.R.Beaulah Jeyavathana1 , Dr.R.Balasubramanian2

on a TB screening and a TB suspect database using both an external and a radiological reference
standard. The systems to detect different types of TB related abnormalities and their combination
is described.
Wenbo Li, Yan Kang [17] used a new adaptive VOI selection method. Twenty-two features
were extracted to distinguish nodules, vascular endpoint or vascular cross structure, designed an
optimal feature combination selection frame based on improved genetic algorithm and support
vector machine. The improved GA algorithm to select the optimal feature combination from the
feature pool to establish SVM classifier. Identification of cancer tumor from lung CT images using
edge detection and boundary tracing. Feature extraction, this stage is used to find the size of
the tumor based on the area, perimeter, and irregularity index. The final stage is classification
stage, this stage is to separate the tumor as benign (or) malignant. To classify the lung cancer, by
using the data mining, classification techniques like SMO (Sequential Minimal Optimization), J48
decision tree, Naive Bayes. Once the classification is performed, they compared the experimental
results of the above classification techniques, and determine which one gives accurate and correct
answers.
Elmar Rendon-Gonzalez [20] proposed the pre-processed step have several masks are calculated
using thresholding technique and morphological operations, eliminating this way, background
and surrounding tissue. The suspicious Regions of Interest (ROI) are calculated using a priori
information and Hounsfield Units (HU). In feature extraction, numerous features are calculated
in order to restrict the suspicious zones. Finally, Support Vector Machine (SVM) algorithm is
employed in classification stage.
Shajy Lekshmanan et al., [24] proposed a methodology for the early detection of lung cancer,
using lung cytology image analysis. Before the segmentation, the images are pre-processed using
Contrast Limited Adaptive Histogram Equalisation (CLAHE). The features are extracted from
segmented PAP stained sputum cytology images, using Discrete Wavelet Transforms (DWT).
Proper feature extraction gave a good classification result, which is to be obtained through a
good feature extraction and classification method. They have used DWT to extract the features
from sputum cytology images. The enhanced images are segmented using Otsu segmentation
method. The segmented images are classified using Feed Forward Back Propagation Neural Network
(FFBNN). The experimental results show that FFBNN gave better classification result using
features from DWT-based matrix and obtained an accuracy level 92.9%.
V.M. Selvalakshmi, S. Nirmala Devi [25]proposed a new, rapid and efficient region-based
segmentation method for Liver tumor segmentation initialized using spatial FCM clustering
technique. In Legendre level sets, the area of interest illumination is represented in lower
dimensional subspace. A set of predefined basis functions such as Legendre basis function is
used to represent the lower dimensional subspace. This kind of representation enables the robust
segmentation of heterogeneous objects even in the presence of noise. The proposed algorithm has
been compared with other existing algorithms and its performance evaluation is carried out in CTA
abdomen images of various patients. The obtained results prove its effectiveness in low contrast
inhomogeneous tumor segmentation.
S. Saravanan et al., [26] proposed a work to remove the intricacies involved in demarcating
the malignant and benign of the speculated Solitary Pulmonary Nodules (SPN). Edges can be
classified as irregular edge with corona radiata, lobulation, notching signs and a distinct soft,
uncloudy contour edge. These edges are hardly spotted in bronchial carcinoma. This paper
develops an algorithm for automatically detecting stipulated nodules using BPN algorithm, from
the given computed tomography (CT) lung image. Here, to automate the detection of lung nodule,
parametric active contours are used for manual segmentation. Features are extracted from gray
level co-occurrence matrix (GLCM) derived from manually segmented lung nodule and used for
further classification as nodule and non-nodule/normal image. They classified speculated nodule
into malignant or benign by fixing a threshold for the average image intensity after administering
contrast.
Automatic Detection of Tuberculosis based on Adaboost Classifier and Genetic Algorithm 5

Shipra Saraswat et al., [27] described a cross recurrence plot (CRP) toolbox to be used for
computing the recurrence rate values for both (healthy and unhealthy) subjects and artificial
neural network (ANN) toolbox in Matlab is used for generating the accurate results. Radial basis
function neural network (RBFNN) is used for designing the probabilistic neural network classifier
for discriminating the normal from abnormal (VT) signals based on the recurrence rate values.
They also illustrated the cross recurrence quantification analysis (CRQA) of ECG signals followed
by the decomposition method using discrete wavelet transform (DWT) for the analysis of cardiac
disorders with sensitivity, specificity of 98.5% and 97.6% respectively and overall accuracy achieved
is 98.7%.
A.R. Revathi, Dhananjay Kumar [28] proposed an activity recognition using hybridization of
self-adaptive learning particle swarm optimization algorithm with feed forward neural network
(SLPSO-FFNN). Basically, the system consists of four phases namely, Background Estimation
(BE), Object Segmentation (OS), Feature Extraction (FE), and Activity Recognition (AR). They
have generated the high quality background using BE phase. Then OS model is used to extract
the object from the videos and then object tracking process is used to track the object through
the overlapping detection scheme. From the tracked objects, the FE module extracts some useful
features. Finally, SLPSO-FFNN based approach is used to detect the anomaly present in the videos.

3 Methodology
Data used in this work are collected from Aarthi Scans, Tirunelveli. In this study, dataset
containing lung CT images comprising abnormal lung and normal lung are taken from several
patients are utilized. The lung diseases are categorized by the radiologist from the CT Image. Images
are collected from male and female patients whose ages are ranging from 15 to 78 years. All CT
images are in size of 512 x 512 pixels and stored as DICOM (Digital Imaging and Communications
in Medicine) format files.
In this work, Wavelet based statistical feature extraction is used to extract the features in an
image. Genetic algorithm is used for best feature selection to reduce the number of features in
given feature set. Here the KNN classifier is used as an objective function. In genetic algorithm,
individuals represent the population. Genetic algorithm iteratively generate new generation by
applying naturally occurring genetic operators to population. The genetic operations include
reproduction, crossover and mutation. Fitness value is calculated for each individual to estimate the
classification error rate. Classification error rate is calculated from average accuracy that is obtained
through tenfold cross validation method using KNN (K-Nearest Neighbor). Finally, to find whether
the image is normal or abnormal lung image Ada-boost classifier is used. Wavelet approach has
been used to analyze the image in multi scale representation. Texture classification experiments
have been performed using Haar filters, Daubechies filters and eight different biorthogonal filter
pairs. Wavelet based statistical texture features are extracted from the lung region. The extracted
features are optimized by Genetic Algorithm. Optimized features are fed to the adaboost classifier
to detect the presence of tuberculosis.
3.1 Pre-processing
Pre-processing is done to remove unwanted noise and it gives quality to the images at this
stage where filtering is done to remove noise. In our proposed system we have used Wiener filter
to remove noise. Wiener filter preserves the edges and fine details of lungs. It is low pass-filter.
The filter size of 5*5 is selected to avoid over smoothing of the image. 2D Wiener filter is used for
reduction of additive Gaussian white noise in images. Wiener filterestimates the local mean and
variance around each pixel.

1 X
µ= a(n1 , n2 ) (1)
NM
n1 ,n2 η
6 Mrs.R.Beaulah Jeyavathana1 , Dr.R.Balasubramanian2

and

1 X
σ2 = a2 (n1 , n2 ) − µ2 (2)
NM
n1 ,n2 η

wherenis theN-by-Mlocal neighborhood of each pixel in the imageA.wiener2then creates a pixel


wise Wiener filter using these estimates,

σ2 − v2
b(n1 , n2 ) = µ + (a(n1 , n2 ) − µ) (3)
σ2

3.2 Lung Extraction from CT Lung image


CT Lung image contains images of sufficient quality to detect many lung diseases and
abnormalities. A fixed threshold is not possible to extract lung region because,according to the
patient, intensity differs, slice and CT machine. So Adaptive thresholding is used for each slice
based on histogram analysis of each value.
A mean of all pixel values in the image is the accurate threshold for the image. It is selected
as the starting value for the threshold computation iteratively in next step. Histogram is drawn
and it is being analyzed. To make pixel counts to zero, intensities representing dark and brightest
values are removed. The combination or addition of intensity range with highest count pixels and
certain margin to accommodate variance in the lung region is adaptively obtained for each slice.
This range represents lung region pixels.
The pixels in the adaptive threshold range of intensity are extracted. The output looks like
points. It is not very clear. So it is converted to object using morphological closing and opening.
The lungs are extracted along with unwanted fragments located near to it and with intensity same
as that of lungs. But these unwanted fragments can affect proper diagnosis, they should be removed
based on area. The image is complemented and multiplied with the original image after removing
the fragments to get the segmented lungs in CT image.
3.3 ROI Extraction
ROI’s are extracted using the radiologists and thus it is validated to obtain the clinical relevance
which improves the performance of the system. Extract the defected tissues from the lung as
ROI’s and then find the intensity level of the pixels and using the range of pixel intensity values
differentiate the defected tissues and other lung tissues. If there is no defected tissues are present,
then the slice is considered to be Normal. Then obtain the class labels for each ROI’s from the
experts. Finally ROI’s are extracted from the lung tissues and also the class label information is
obtained.
3.4 Wavelet based statistical texture feature extraction
Feature Extraction is transformation of input data into a set of features. In this work, original
image I is represented by a set of sub images at different scales after wavelet transformXe , Dbi i =
1, 2, 3, n − 1.e scale b of the image I. Wavelet co-efficient Dbi (ci , cj ) and the grey level co-occurrence
matrix or GLCM is defined for an image with a countable number of gray levels, the co-occurrence
matrix Fbi θ can be defined for each detail image. The element(l,m) of the probability that a
wavelet co-efficient Dbi − l co occrs with a co-efficient Dbi − m on a distance d in direction θ. Then
statistical features are extracted in horizontal, vertical and diagonal directions (total of 54 features).
The feature values are normalized by subtracting minimum value and dividing by maximum value
minus minimum value. Based on training data set, maximum and minimum values are calculated.
If the feature value is less than minimum value, it is set to minimum value and if it is greater
than maximum value, it is set to maximum value. Then values are normalized and feature selection
algorithm is used to optimize the features.
Automatic Detection of Tuberculosis based on Adaboost Classifier and Genetic Algorithm 7

3.5 Feature Selection


The term Feature selection deals with selecting a subset of features, among the entire features,
that shows the best performance in classification accuracy. The best subset contains the least
number of dimensions that most contribute to accuracy. Optimization searching process is done by
the genetic algorithm which is the efficient methods for function minimization.
3.5.1 Genetic Algorithm
GA is used for selecting the best features after the feature extraction process. Genetic iteratively
generate new generation of programs by applying naturally occurring genetic operators to a
population of computer programs. A genetic algorithm has three operators: selection, crossover,
and mutation. In selection, a good individuals is selected for reproduction, crossover combines good
individuals to generate best offspring (children). The mutation changes a string locally to maintain
genetic diversity from one generation to the next generation of a population of chromosomes. In
each generation, the population is evaluated and tested for termination of the algorithm. If the
termination criterion is not completed, the population is operated based upon by the three GA
operators and then re-evaluated. The basic genetic algorithm is as follows:

Algorithm 1 Part-I
Start
For initial population size do
Create individual from available function sets and terminal sets
End
Until stopping criteria met or number of generations exceeds do
(a) calculate the fitness values of each populations using fitness
function (here it is classification accuracy rate)
(b) Select two individuals from population which produces the optimal fitness
values to participate in the genetic operations in (c).
(c) Create new individuals for the population by applying the following
genetic operations with specified probabilities:
(i)Selection: Select the best individuals from population by tournament
method.
(ii) Crossover: Create new offspring for the new population by
randomly interchanging the chosen parts from the selected two
individuals in (b)
(iii) Mutation:new offspring for the new population is created by randomly
changing/mutating the randomly chosen part of any one selected individuals
in (b).
End
Return the best feature descriptors (best individual)
End

First, design the chromosome. Because the sequence of chromosome contains fifty four features,
the sequence is 0 and 1 binary string whose length is fifty four. 1 indicates the selected feature, and
0 is not selected. Then, produce the initial species randomly. Generate N chromosomes randomly
as the initial population. In order to ensure the diversity of individual in the population, N=50 is
chosen. Secondly, design fitness function. For finding accuracy, KNN classifier has been used.
For each training example x in the training data set n.
8 Mrs.R.Beaulah Jeyavathana1 , Dr.R.Balasubramanian2

Find the K nearest neighbors in the training data set based on the Euclidean distance.
v
uXm
(xi − yi )2
u
u
t
i=1
dist(A, B) = (4)
m
Predict the class value by finding the maximum class represented in the K nearest neighbours.

Algorithm 2 Algorithm Fitness Function Evaluation


1: procedure fit()
2: Feature Index ← Indices of ones from Binary Chromosome
3: New DataSet indexed by FeatIndex
4: NumFeat←Number of elements in Feature Index
5: 3←NumNeighborskNN
6: KNNError ←ClassifierKNN (DataSet,ClassInformation,NumNeighborskNN)
7: Return kNNError
8: end procedure

3.5.1.1 Selection Function

The aim of selection in GA is to make sure the population is being constantly improved over all
fitness values. The selection method helps the GA in discarding bad designs and keeping only the
best individuals. There are many selection mechanisms in GA; the default of this being stochastic
uniform (with default size 4) but Tournament Selection of size 2 was used in this work due to its
simplicity, speed and efficiency. Also, tournament selection enforces higher selection pressures on the
GA (resulting in higher rate of convergence) and makes sure the worst individual does not get into
the next generation. In the GA, two functions are needed to perform tournament selection. The first
function generates the players (parents) needed in the actual tournament function, while the second
function which outputs the winner of the tournament. The fitness of the selected chromosomes is
ranked and the best of this becomes the winner. In tournament selection of size 2, two chromosomes
are selected from the population after the Elite kids (children) are taken out and the better of the
two chromosomes, (using fitness ranking), is selected. It is performed iteratively until the new
population is filled up.
3.5.1.2 Crossover function

The crossover operator in the GA genetically combines two individuals (parents) to form
children for the next generation. Two parent chromosomes are needed to carry out crossover
operation. The two chromosomes are taken from tournament selection. The GA uses crossover
fraction, say, XoverFrac to specify the number of kids produced by the crossover functional after
Elite kids are removed from the current population being evaluated.. The variable XoverFrac, as
discussed in the preceding section, is bounded by the inequality 0 ≤ XoverF rac ≤ 1. The value
used for XoverFrac in this work is 0.8 and the crossover function chosen is arithmetic type. In
this case, XOR operation is performed on the two parent chromosomes since they are binary.
Crossoverkids=p1 ⊕ p2
where,

• ⊕is an XOR operator for binary operands;


• p1 = first parent needed by the crossover function;
• p2 = second parent needed by the crossover function;
Automatic Detection of Tuberculosis based on Adaboost Classifier and Genetic Algorithm 9

The XOR operator ⊕ works as follows:


1⊕1=0
1⊕0=1
0⊕1=1
0⊕0=0
So for two binary operands (parent chromosomes);
p1 = 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 1
p2 = 0 1 1 1 1 0 1 1 1 1 0 0 0 1 1 1 1 0 1 1
CrossOverKid=p1 ⊕ p2
CrossOverKid = 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 0 1 1 0
3.5.1.3 Mutation function

It is a genetic perturbation of individuals in the population. Mutation ensures genetic


diversity and searching of broader solution space. We used uniform mutation as our choice. For
uniform mutation, the GA generates Genome Length set of random numbers (RDs) from uniform
distribution. The value of each random number is associated with the position of each gene (bit)
in the chromosome. The chromosome is scanned from left to right and for each associated bit,
the value of each RD is compared with the mutation probability (denoted as mp) and if the RD
at position i is less than mp, then gene (bit) at position i is flipped. Otherwise, the gene is left
unflipped. This is repeated from the Least Significant Bit (LSB) to the Most Significant Bit (MSB)
of each chromosome in the mutation children. The selected feature set based on the test data set is
used to train the Adaboost Classifier for classifying tuberculosis to select the optimum feature set.
3.6 Classification
AdaBoost means AdaptiveBoosting. Ada boost is a device learning meta-algorithm. Some
other machine learning algorithm can be used with Adaboost algorithm to improve the performance.
Boosting is an ensemble technique that attempts to create a strong classifier from a number of weak
classifier. If these boosting algorithms are employed then the weighted sum is the output of the
algorithm. Ada boost is sensitive to pierce data and outliers. It makes a committee of weak classifiers
by adaptively adjusting the weights at each loop. While the weights of the training patterns classified
correctly by a weak classifier are decreased, the weights of the training patterns misclassified by the
weak classifier are increased. AdaBoost is a relatively new algorithm that was selected based on its
strong theoretical basis, simplicity to implement, transparency of feature selection, and performance
for discriminative tasks. The algorithm concurrently selects and combines relevant features from
the feature set during the training of each independent classifier, thus avoiding a separate feature
selection process common with other classification methods. The basic principle of the algorithm
is that any number ofweakclassifiers with an error rate less than 50% can be combined to form
anensembleclassifier whose error rate approaches that of an optimal classifier. Adaboost algorithm
shows good performance effect because of the ability to generate expanding diversity. It shows good
performance effect because of the ability to generate expanding diversity. In order to improve the
performance result of the final ensemble, AdaBoost algorithms consist of diverse weak classifiers.
3.7 Performance Evaluation
The performance of the system is evaluated with the radiologist. If both diagnosis and test are
positive, it is called True Positive (TP). The probability of a TP to occur is estimated by counting
the true positives in the sample and dividing it by the sample size. If the diagnosis is positive and
the test is negative it is called False Negative (FN). False Positive (FP) and True Negative (TN)
are defined similarly. Accuracy is given by
TP + TN
Accuracy =
TP + TN + FN + FP
10 Mrs.R.Beaulah Jeyavathana1 , Dr.R.Balasubramanian2

The values described below are used to calculate different measurements of the quality of the
test. The first one is sensitivity, SE, which is the probability of having a positive test among the
patients who have a positive diagnosis.

TP
SE =
FP + FN

SP, is the probability of having a negative test among the patients who has a negative diagnosis.

TN
SP =
FP + TN

Two other measurements that can be used are the positive predicting value (PPV) and negative
predicting value (NPV).

TP
PPV =
TP + FP
TN
NP V =
TN + FN

4 Implementation and Results


The CT Lung images are collected from 50 normal patients (TB unaffected) and 50 abnormal
(TB affected) patients. The CT lung image collected from patients was in DICOM format. They
are converted to BMP image which is an acceptable format for medical image processing. The lungs
are extracted using the method described from the image.
The lung regions are extracted within a fraction of second successfully based on the above
method. Doctors and Radiologists evaluated the results. For most of the images, the lung region
is extracted completely, although there are some erroneous cases where some unwanted fragments
are also extracted. This is due to certain intensities to the middle intensity of the lung region while
fixing adaptive threshold. However, most of the cases did not influence much on the recognition.
The segmentation technique is accurate and very simple in extracting lung boundary in most of
the images because of morphological operations, preserving the shape of the lung. Hence the lung
regions are correctly captured. Adaptive threshold technique solves the problem of varying intensity
for patient and CT machine parameters. The results are compared with existing results. Most of
the images are noise free images. If there is noise in the image, preprocessing is done to remove
noise by applying wiener filtering technique.
54 features are extracted as explained in the methodology. For feature selection, genetic
algorithm is used to optimize the feature set. Based on the experiments conducted with the
available data, Genetic Algorithm is taking minimum amount of time and perform same sequence of
operations and always produce the same results. Since the classification accuracy of 95% is obtained
with the features entropy, mean, homogeneity, contrast and correlation extracted from the images
and is going absolute performance in this domain, the cost of classifier is reduced. Also the wavelet
based statistical feature extraction is used and features are analyzed.
Automatic Detection of Tuberculosis based on Adaboost Classifier and Genetic Algorithm 11

(a) CT lung image (b) Pre-processed Image

(a) Segmented lung (b) ROI Extraction

Fig. 1

In our study, to implement our proposed algorithm, we used MATLAB software (R2016a) on a
laptop, Intel Core i3 (2.0 GHZ) and 4GB memory. To evaluate our proposed algorithm efficiently,
we analyzed each step of our algorithm based on processing time.

Fig. 2: Processing Time Analysis


12 Mrs.R.Beaulah Jeyavathana1 , Dr.R.Balasubramanian2

Table 1: Selected features using GEKNN (proposed)

Feature Selector Iterations Selected Features


20 5,7,8,13,15,17
GEKNN
50 8,13,15,17
Algorithm(Proposed)
80 4,9,15,22

The best features are selected in each iterations. It iteratively improves solution by moving
towards the optimal solutions. Using our proposed algorithm GEKNN, optimal features are selected
according to number of iterations and the results are shown in Table 2.

Table 2: Classification performance of normal and abnormal lungs for 100 samples (50 Normal and
50 abnormal lungs)

Applying wavelet Without applying wavelet


Parameter
SVM MLP Adaboost SVM MLP Adaboost
TP 46 43 47 43 40 43
TN 45 45 46 41 42 40
FP 5 5 4 9 8 10
FN 4 7 3 7 10 7
Accuracy 91 88 93 84 82 83
Sensitivity 92 86 94 86 80 86
Specificity 90 90 92 82 84 80
PPV 90 89.5 92 82.7 83.3 89.5
NPV 92 85.5 94 85.4 80.7 85.1

From Table 3 it is clear that Adaboost outperforms MLP and SVM in accuracy. Speed is
also very good when compared to other classifiers. Hence Adaboost is a best classifier compared
to SVM and MLP in terms of speed and recognition rate. Sensitivity and accuracy values are
calculated and its been analyzed. Depending on the training set, there is a different threshold
value for determining true positive and true negative cases. As a result, there will be a swapping
between sensitivity and specificity value. Adaboost classifier performance was analyzed based on
the experiments with training set of 30 patients and testing set of 30 patients in each category. For
designing the classifier the training set contains 30 normal and 30 abnormal images and testing set
contains 50 normal and 50 abnormal images are considered. Hence Adaboost is chosen as a good
classifier in this application domain for detection of tuberculosis.
Figure 3 represents classification performance and it clearly tells that Adaboost classifier performs
better than SVM and MLP classifiers.

5 Conclusion
The segmentation method is simple and accurate in extracting the lung boundary in most of
the slices because of the morphological operations preserving the shape of the lung. An adaptive
threshold solves the problem of varying intensity for patient and CT machine parameters. Results
show that, Wavelet based statistical features are yielding better results compared to the statistical
features extracted directly from image without applying wavelet transform. This justifies the choice
of using wavelet transform. The recognition rate of the Adaboost classifier for classifying normal and
abnormal lung images using CT images is 93%. The sensitivity is 94% and specificity is 92%. Hence
it is concluded that Adaboost classifiers supported by conventional image processing operations
can be effectively used for tuberculosis diagnosis.Use of larger databases is expected to improve the
system robustness and ensure the repeatability of the resulted performance.
For Future research work we plan to extend for other types of lung diseases like lung tumors,
lung cancers and also for other types of medical images like dual CT and MRI using Deep Neural
Automatic Detection of Tuberculosis based on Adaboost Classifier and Genetic Algorithm 13

Fig. 3: Classification performance analysis

Networks. Deep learning networks play a vital role in disease diagnosis. So we have decided to use
this network to improve our classification accuracy. Classification method which could significantly
decrease healthcare costs via early prediction and diagnosis of tuberculosis disease.

References
1. Sahoo P.K, Soltani S, Wong AK, Chen YC. A survey of thresholding techniques. Compute Vs Graph Image
Processing 1988; 41(2):233-60
2. A. Leung, N. Muller, P. Pineda, and J. FitzGerald, ”Primary tuberculosis in childhood: Radiographic
manifestations,” Radiology, vol.182, pp. 87-91, 1992.
3. B. van Ginneken, S. Katsuragawa, B. M. ter Haar Romeny, K. Doi, and M. A. Viergever, ”Automatic
detection of abnormalities in chest radiographs using local texture analysis, ”IEEE Trans. Med.Imag.,vol.
21, no. 2, pp. 139-149, Feb. 2002.
4. B.Van Ginneken, B.M.HaarRomeny, and M.A.Viergever, ”Automatic segmentation and texture analysis
of PA chest radiographs to detect abnormalities related to interstitial disease and tuberculosis,” Comput.
Assist. Radiol.Surg., pp. 685-688,2002.
5. K.Siddiqietal, ”Clinical diagnosis of smear-negative pulmonary tuberculosis in low-income countries: The
current evidence,”Lancet Infectious Diseases, vol. 3, p. 288, 2003.
6. S.A. Patil, V.R. Udupi, ”Geometrical and texture features estimation of lung cancer and TB images using
chest X-ray database” IJBET: International Journal of Biomedical Engineering and Technology 2011.
7. S.A. Patil, V.R. Udupi, ”Geometrical and texture features estimation of lung cancer and TB images using
chest X-ray database” IJBET: International Journal of Biomedical Engineering and Technology 2011.
8. Sunanda Gupta, S.K. Chakarvarti, ”Medical image registration based on fuzzy c-means clustering
segmentation approach using SURF” IJBET: International Journal of Biomedical Engineering and
Technology 2011.
9. M. Hariharan,M.P. Paulraj,Sazali Yaacob, ”Detection of vocal fold paralysis and oedema using time-domain
features and Probabilistic Neural Network” IJBET: International Journal of Biomedical Engineering and
Technology 2011.
10. D.Zacharyetal, ”Changes in tuberculosis notications and treatment delay in Zambia when introducing a
digital x-ray service,” Public Health Action, vol. 2, pp. 56-60, 2012.
11. Marius George Linguraru, William J. Richbourg, Jianfei Liu, Jeremy M. Watt, ”Tumor Burden Analysis
on Computed Tomography by Automated Liver and Tumor Segmentation” IEEE TRANSACTIONSON
MEDICAL IMAGING, VOL.31, NO.10, OCTOBER 2012.
12. Girisha A , M C Chandrashekhar, Dr. M Z Kurian, ”Texture Feature Extraction of Video Frames Using
GLCM” International Journal of Engineering Trends and Technology (IJETT) - Volume 4 Issue 6- June
2013.
13. S. Jaeger, A. Karargyris, S. Candemir, L. Folio, J. Sielgelman, F.Callaghan, Z. Xue, K. Palaniappan, R.
Singh, S. Antani, G. Thoma, Y.-X. Xiang, P.-X. Lu, and C. McDonald, ”Automatic tuberculosis screening
using chest radiographs,” IEEE Trans. Medical Imaging, 2013.
14. M.Akhil jabbar, B.L Deekshatulua Priti Chandra b, ”Classification of Heart Disease Using K- Nearest
Neighbor and Genetic Algorithm” Elsevier International Conference on Computational Intelligence:
Modeling Techniques and Applications (CIMTA) 2013.
14 Mrs.R.Beaulah Jeyavathana1 , Dr.R.Balasubramanian2

15. Girisha A , M C Chandrashekhar, Dr. M Z Kurian, ”Texture Feature Extraction of Video Frames Using
GLCM” International Journal of Engineering Trends and Technology (IJETT) - Volume 4 Issue 6-June
2013.
16. Mythily.A, Veena.M.U, ”Segmentation and Classification of Lung Tumour using Chest CT Image for
Treatment Planning” International Journal of Engineering Trends and Technology (IJETT) - Volume 7
Number 2- Jan 2014 ISSN: 2231-5381
17. Sema Candemir, Stefan Jaeger, Kannappan Palaniappan, Jonathan P. Musco, ”Lung Segmentation in
Chest Radiographs Using Anatomical Atlases With Non-rigid Registration” IEEE TRANSACTIONSON
MEDICAL IMAGING, VOL.33, NO.2, FEBRUARY 2014
18. C.Bhuvaneswari, P.Aruna, D.Loganathan, ”Classification of Lung Diseases by Image Processing Techniques
Using Computed Tomography Images” International Journal of Advanced Computer Research (ISSN
(print): 2249-7277 ISSN (online): 2277-7970) Volume-4 Number-1 Issue-14 March-2014.
19. Shuangfeng Dai, Ke Lu, Jiyang Dong ”Lung segmentation with improved graph cuts on chest CT images”
2015 3rd IAPR Asian Conference on Pattern Recognition.
20. Laurens Hogeweg, Clara I. Sanchez, Pragnya Maduskar, Rick Philipsen, Alistair Story,” Automatic
Detection of Tuberculosis in Chest Radiographs Using a Combination of Textural, Textural,
Focal and Shape abnormality Analysis ”IEEE TRANSACTIONS ON MEDICAL IMAGING,
VOL.34,NO.12,DECEMBER 2015
21. Shenshen Sun, Wenbo Li, Yan Kang ”Lung Nodule Detection Based on GA and SVM” 2015 8th
International Conference on Bio Medical Engineering and Informatics (BMEI 2015).
22. Bahareh Hahangian, Hossein Pourghassem ”Automatic brain hemorrage segmentation and classification
algorithm based on weighted grayscale histogram feature in a hierarchical classification structure”
Biocybern Biomed Engineering 36(2016) Elsevier
23. Elmar Rendon-Gonzalez and Volodymyr Ponomaryov ”Automatic Lung Nodule Segmentation and
Classification in CT Images Based on SVM” International conferences on 2016 IEEE.
24. Babatunde Oluleye, Diepeveen Dean, A Genetic Algorithm-Based Feature Selection International Journal
of Electronics Communication and Computer Engineering Volume 5, Issue 4, 2016 ISSN (Online):
2249071X, ISSN (Print): 22784209.
25. Preeyanan Pattrapisetwong and Werapon Chiracharit ”Automatic lung segmentation in chest radiographs
using shadow filter & local threholding” Computational Intelligence in Bioinformatics and Computational
Biology (CIBCB), 2016 IEEE Conference.
26. Robert A. Ochs, Jonathan G. Goldin, Fereidoun Abtin, Hyun J. Kim, Kathleen Brown, Poonam
Batra,bDonald Roback, ”Automated classification of lung broncho vascular anatomy in CT using
AdaBoost” HHS Public access.
27. Shajy Lekshmanan; Varghese Paul; P. Smitha; K. Sujathan, ”Classification of lung columnar cells using
feed forward back propagation neural network” IJBET: International Journal of Biomedical Engineering
and Technology 2016.
28. V.M. Selvalakshmi; S. Nirmala Devi, ”Improved fuzzy clustering and Legendre level sets for segmentation
of multiple tumours in low contrast liver CTA images” IJBET: International Journal of Biomedical
Engineering and Technology 2017.
29. S. Saravanan, G. Selvakumar, C. Amarnath, S. Udayabaskaran, S. Manikandan, ”CAD for demarcation of
malignant and benign nodules in CT lung images of spiculated nodules” IJBET: International Journal of
Biomedical Engineering and Technology 2017.
30. Shipra Saraswat; Geetika Srivastava; Sachidanand Shukla, ”Classification of ECG signals using
cross-recurrence quantification analysis and probabilistic neural network classifier for ventricular
tachycardia patients” IJBET: International Journal of Biomedical Engineering and Technology 2017.
31. A.R. Revathi; Dhananjay Kumar, ”Hybridisation of feed forward neural network and self-adaptive PSO
with diverse of features for anomaly detection” IJBET: International Journal of Biomedical Engineering
and Technology 2017.
32. Dilip Kumar Choubey; Sanchita Paul, ”GA RBF NN: a classification system for diabetes” IJBET:
International Journal of Biomedical Engineering and Technology 2017.
33. Nallasivan Gomathinayagam; Janakiraman Subbiah, ”Analysis of CT lung images using orthogonal moment
features IJBET: International Journal of Biomedical Engineering and Technology 2017.
34. G. Vijaya, A. Suhasini, R. Priya AUTOMATIC DETECTION OF LUNG CANCER IN CT IMAGES
IJRET: International Journal of Research in Engineering and Technology 2017
35. Marius George Linguraru, William J. Richbourg, Jianfei Liu, Jeremy M. Watt, ”Tumor Burden Analysis
on Computed Tomography by Automated Liver and Tumor Segmentation” IEEE 2017

You might also like