You are on page 1of 13

Received: 21 November 2018 Revised: 8 May 2020 Accepted: 11 May 2020

DOI: 10.1002/ima.22445

RESEARCH ARTICLE

Morphological feature extraction and KNG-CNN


classification of CT images for early lung cancer detection

Sanjukta Rani Jena1 | Selvaraj Thomas George2

1
Department of Electronics and
Communication Engineering, Karunya
Abstract
Institute of Technology and Sciences, Lung cancer is a dangerous disease causing death to individuals. Currently pre-
Coimbatore, Tamil Nadu, India
cise classification and differential diagnosis of lung cancer is essential with the
2
Department of Biomedical Engineering,
stability and accuracy of cancer identification is challenging. Classification
Karunya Institute of Technology and
Sciences, Coimbatore, Tamil Nadu, India scheme was developed for lung cancer in CT images by Kernel based Non-
Gaussian Convolutional Neural Network (KNG-CNN). KNG-CNN comprises of
Correspondence
Selvaraj Thomas George, Department of
three convolutional, two fully connected and three pooling layers. Kernel based
Biomedical Engineering, Karunya Non-Gaussian computation is used for the diagnosis of false positive or error
Institute of Technology and Sciences, encountered in the work. Initially Lung Image Database Consortium image
Coimbatore 641 114, Tamil Nadu, India.
Email: thomasgeorge@karunya.edu, collection (LIDC-IDRI) dataset is used for input images and a ROI based seg-
thomasgeorge29@gmail.com mentation using efficient CLAHE technique is carried as preprocessing steps,
enhancing images for better feature extraction. Morphological features are
extracted after the segmentation process. Finally, KNG-CNN method is used for
effectual classification of tumour > 30mm. An accuracy of 87.3% was obtained
using this technique. This method is effectual for classifying the lung cancer
from the CT scanned image.

KEYWORDS
automatic detection, CLAHE, CT, kernel based non-Gaussian convolutional neural networks,
lung cancer, morphological, ROI

1 | INTRODUCTION and polyps) in certain region of body and gives location to


radiologists.2 CAD has turns to be a foremost research
Lung cancer is one of the most life threatening disease field in diagnostic and medical imagining and applied to
around the world. The death count due to lung cancer diverse medical imaging modalities like computed tomog-
is higher when compared to that of breast, colon and pros- raphy (CT), magnetic resonance imaging (MRI) and ultra-
tate cancer. Prior identification of cancer caused nodules sound imaging. In general, CAD systems typically foe
is a significant clinical indication for diagnosing lung can- cancer diagnosis and detection (ie, polyp, lung, breast)
cer as these nodules possess high probabilities to be malig- has three significant stages such as region of interest
nant nodules.1 In general, solitary pulmonary nodules are (ROI) based segmentation, feature extraction (kind of fea-
referred to abnormalities in lung that are approximately ture) and classification. Feature extraction and nodule
spherical and rough opacity with 30 mm diameter. classification belongs to reduction step.
A significant task of computer aided detection (CAD) Present CAD systems for classification have attained
systems is that it can assists/improve radiologist poten- high sensitivity levels and capable to enhance the radiolo-
tially and work flow reduces error outcomes. CAD is an gists' performance in nodule characterization in CT section,
approach that identifies suspicious region (masses, nodule where present scheme for nodule identification appear to

Int J Imaging Syst Technol. 2020;1–13. wileyonlinelibrary.com/journal/ima © 2020 Wiley Periodicals LLC 1
2 JENA AND GEORGE

provide numerous false positives (FPs).3,4 It is owing to expectation is testing and gainful at the meantime, it
detection procedures are sensitivity as certain nodules enables to characterize objects, foresee variation in images
are labeled (eg, blood vessels) unavoidably during detection precisely at deep learning and furthermore by utilizing
step. As radiologists investigate every objects identified, it advanced techniques of CNN, it is currently conceivable
is extremely pleasing to eradicate FPs possibly while to construct AI which assists in decreasing radiologist
maintaining true positives (TPs). weight, hospitals, expert and clinical professionals. Image
FP reduction is to remove FPs possibly to maintain high processing in unique region of concentrate for training
sensitivity. It is a classification among non-nodule and nod- data to anticipate appropriate outcomes by acquiring
ule, aiming to produce novel methods to distinguish suspi- progressive increases to prevailing schemes and further-
cious regions accurately, by reduction of FPs with machine more by making utilization of therapeutically precise seg-
learning approaches.5 The purpose for performing classifi- mentation strategies will be conceivable to accomplish
cation step is to learn system ability to predict unknown 95% precision or more for extensive variety of patient's
output class of suspicious nodule with generalization abil- dataset.
ity. Classification plays a vital role in FPs reduction in CAD In Reference 9, Atsushi Teramoto built up a comput-
detection and diagnosis techniques. Deep learning is used erized classification conspire for lung tumors in micro-
for both classifications and feature extraction in diverse scopic images utilizing DCNN. Simulation demonstrated
fields6,7 like computer vision and speech. In this work, around 70% images is grouped accurately. The outcomes
kernel based non-Gaussian convolutional neural network demonstrate DCNN is valuable for classification of malig-
(KNG-CNN) is proposed for nodule classification for tumor nant growth in cytodiagnosis.
>30 mm using LIDC-IDRI database. In Reference 10, QingWeng Song created three essen-
tial deep neural networks were misused and widely
assessed. The expectation in classification of malignant and
1.1 | Contribution of the work benign pulmonary nodules was analyzed in LIDC-IDRI.
The trial results recommend that the CNN chronicled the
1 The nodule representation using ROI segmentation best performance than deep neural networks (DNN) and
using efficient contrast limited adaptive histogram stacked autoencoder (SAE). The layers of the neural system
equalization (CLAHE) technique for discrimination in this paper are generally small, because of the constraints
among false and true nodule. Moreover, it is complex of data sets. The proposed strategy can be required to
to attain better feature representation. The anticipated enhance precision of the other database. The strategy can
method provides finest feature ROI representation. be summed up to plan of high performance CAD frame-
2 After performing ROI segmentation, here morphologi- works for other medical imaging tasks later on.
cal features are extracted from the segmented image. In Reference 11, Kingsley Kuan anticipates lung dis-
3 Finally, KNG-CNN for eliminating FP and efficient ease identification in full 3D CT-scan is challenging.
classification for tumor >30 mm. Because of flawed datasets, this methodology utilized
4 It helps the radiologist to find the affected nodules effi- LUNA16 to prepare a knob finder, and locator with
ciently. Therefore, convolutional NN is capable of KDSB17 dataset to give worldwide highlights. Kuan uti-
identifying extensive range of nodule representation. lize it and neighborhood features from different knob
classifier for distinguishing lung malignancy with high
precision quality by opposition, where 41 out of 1972
1.2 | Organization of the work groups (top 3%).
In Reference 12, Hamada R. H. Man-made intelligence
The remainder of work is illustrated as: Section 2 Absi, introduced a strategy for lung cancer growth finding
describes state of the art of the existing deep learning dependent on cluster “k” nearest neighbor calculation. The
techniques; Section 3 explains the detailed methodology outcomes appeared in the past segment exhibits the capa-
of the proposed work; Section 4 shows the experimental bility of the strategy in malignant classification of 96.58%
outcomes of proposed work and comparison with existing accuracy has been proficient up until this point. Addition-
work. Section 5 concludes this investigation. ally explores different avenues regarding more wavelet
capacities will be done to build the precision of the strat-
egy. In expansion, to additionally enhance the outcomes,
2 | R E LA T E D WOR KS curvelet change will be used to in the framework in future
trials and contrasted and wavelets.
In Reference 8, Karan Sharma describes feature extraction In Reference 13, Prajwal Rao foreseen interpretation
and Image processing from images to make precise invariance of CNNs is misused to characterize the
JENA AND GEORGE 3

lung malignant growth screening thoracic CT examines 3 | PROPOSED METHODOLOG Y


productively. By utilizing CNNs, one can renounce the
dull procedure of physically extracting features for clas- 3.1 | Method
sification which requires particular domain learning.
The archived characterization exactness demonstrates In this segment, the efficient CLAHE method for ROI
that CanNet beats both ANN and LeNet design for the segmentation and KNG-CNN for classifying error rate or
given classification task. Advance experimentation on FP predication are discussed in this section. In this inves-
different hyper parameters of CNN should be possible tigation, morphological feature extraction was consid-
with the end goal to expand the precision. This work ered. The features selected are considered as an input for
shows an initial move toward computerizing the group- performing the proposed deep learning technique. The
ing methodology, and in the long run winding up supe- deep learning features were executed using MATLAB
rior to prepared experts at this especially urgent and environment. Abnormality of the nodules were predicted
basic task. with the KNG-CNN with the outputted morphological
In Reference 14, Zirong Li depict a strategy to identify feature vectors specifically, and then merged to provide
chest radiograph mass utilizing profound learning tech- the classification output.
nique, and build up a physically marked database. Con-
trasting the two systems that depended on various
engineering, Faster RCNN design can adequately recog- 3.2 | Dataset
nize and find chest mass, and component extraction part
utilizing RESNET indicates superior execution. Li clas- The dataset considered in this work is LIDC-IDRI dataset
sifies the images as infected or sound, deep learning strat- which comprises of 1010 thoracic CT scans with diagno-
egy demonstrates superior outcome. The creator did sis reports and size reports that acts as a resource for
knob location and characterization, the most profound medical imaging research. LIDC radiologists annotations
structure utilized for just five convolutions layers named in cooperates free hand outlines of nodules lesser than
AlexNet. Finally, object identification strategy can be uti- or equal to 3 mm diameter for every CT slice, where
lized to perform target discovery assignments on Gray the nodules are visible with subjective rating on 5 to
scale radiographs. 6 point scale of features. Annotations include single mark
In Reference 15, Ryota Shimizu and Shusuke (approximate centered) of nodules ≤3 mm in diameter
Yanagawa attempted to apply deep Learning to human along with non-nodules ≥3 mm.
urine information and accomplished 90% exactness in In the preliminary stage of extracting the ROI, geomet-
the assurance of lung cancer. This work demonstrated ric center is evaluated by margins of provided region mar-
that deep learning is likewise powerful for human essen- ked in the database. Then, region size is described by
tial information investigation and can do pre-diagnosis whether it is larger than 32 × 32. This rectangle region
with no unique medicinal information. segmented with similar geometry of marked region, if the
In Reference 16, Md. Badrul Alam Miah proposed size is lesser than 32 × 32. Else, larger size (ie, 64 × 64) is
technique for binary thresholding, and includes extrac- acquired as candidate ROI and then sampled down to
tion, and after that these features are utilized to prepare 32 × 32 size at last. There are non-nodule regions extracted
NN and test it. The anticipated framework effectively in similar way to generate negative samples during testing
distinguishes lung malignant growth from CT images. and training process.
The anticipated framework test 150 kinds of CT lung So as to examine the efficacy of neural networks for
images and acquires outcome where generally succes- diverse image sizes, dataset is made of 64 × 64 sizes with
sive rate of framework is 96.67% which meets desired similar procedure. As an outcome, ROI image patches
outcome. are extracted from LIDC lung image comprising 40 772
In Reference 17, Sardar Hamidian builds CADe and 21 720 non-nodules.
framework for lung cancer identification utilizing 3D
CNN that works in: screening and segregation stages. 3D
fully CNN is utilized for processing of whole CT and cre- 3.3 | ROI extraction
ate ROI candidate. CNN consequently classifies every
applicant locale as background/nodule. This engineering In this investigation, high resolution CT scanned images
prompts 800-fold accelerate contrasted with utilizing are considered for input, noises owing to external sub-
brute force strategy for sliding 3D CNN over volume to stances are eradicated while getting scanned lung image.
acquire scores for entire CT. Pulmonary nodes show an Moreover, it is necessary to extract ROI from lung image
extensive variety of shapes and sizes. to classify nodule efficiently. ROI is a region from the
4 JENA AND GEORGE

image, and provided for some other operations such as CDF slope and transformation function. The value at
classification and feature extraction. ROI are defined which histograms are clipped is termed as clip limit, based
using binary masks by adjusting pixel values as 1 to ROI on histogram normalization and thereby on size of neigh-
related image process and 0 to other regions of image. borhood region. Clip point is evaluated as in Equation (1):
More than single ROI can be acquired from an image.
ROI are provided based on the range of intensity that M / 
β= 1+ Smax ð1Þ
emphasis contiguous pixel. Traditional image enhance- N 100
ment techniques such as histogram equalization are very
effectual and simple. It explicitly changes the gray level where, M, pixel count in every block; N, ranges of block;
of image according to probability distribution function of Smax, maximum slope; α, clip factor.
image and enhances dynamic range of gray distribution When “α” is nearest to 0, clipping point (CP) is to be
to improve visual effect of an image. M/N, therefore pixels in block will be constant. When “α”
Based on conventional histogram equalization is nearing to 100, contrast is improved in huge degree.
algorithm, this work introduces an effectual-CLAHE Therefore, CP is a primary factor to regulate improvement
(E-CLAHE) to analyze gray level mapping of an image. in contrast. In CDF, mapping function is attained to remap
Henceforth, various definitions for gray level, threshold block image with gray level as trails in Equations (2)
value setting and lung nodule identification are initiated. and (3):
Followed by this, entropy associated to information is
cast off as target function to attain the parameter value β X
l

in mapping algorithm. With respect to threshold settings, cdf ðlÞ = pdf ðlÞ ð2Þ
k=0
the anticipated CLAHE identifies image gray level and
alters two adjacent gray levels in new histogram adap-
T ðlÞ = cdf ðlÞ × lmax ð3Þ
tively. Henceforth, an effectual improvement in visual
effect of image is attained for further processing.
where, T(l), remapping function; l, gray level; lmax, maxi-
mum pixel value in block.
3.4 | CLAHE analysis for ROI With respect to, CDF with redistributed histogram in
segmentation block, diverse remapping functions are attained. To avoid
artifacts, every pixel value is interrupted by mapping func-
The obtained CT image is transformed to gray scale tions. a, b, c and d are center pixels of blocks, where “p” is
image. In general, intensity values of input images are pixel bounded by blocks arbitrarily. Remapped “p” pixel
different, thus an effectual CLAHE is utilized to equalize by bilinear interpolation is attained as in Equation (4):
image. In E-CLAHE, initially images are partitioned into
smaller tiles, based on the intensity values; the contrast is T ðpðiÞÞ = m:ðn:T a :pðiÞ + ð1 −nÞ:T b :pðiÞÞ
enhanced in all tiles. This assists in projecting the fea- + ð1 −mÞ:ðn:T c :pðiÞ + ð1 −nÞ:T d :pðiÞÞ ð4Þ
tures that are hidden in an image more visible. Contrast
limited AHE (CLAHE) varies from traditional histogram
equalization in limiting the contrast. These features are where, T(·), remapping function; p(i), value of pixel “i”
also provided to global histogram equalization, thus pro- with coordinate (x, y).
viding rise to contrast limited histogram equalization that Interpolation step eliminates artifacts. Owing to
is rarely utilized in practice. blocks processing, E-CLAHE attains lesser computational
In CLAHE, contrast limiting process has to be used complexity for improvement.
for every neighborhood pixels from where transformation
function is derived. CLAHE was generated to eliminate
over all noise amplification that cannot be achieved using 3.5 | Feature extraction
adaptive histogram equalization as in Figure 1. But this
can be attained by limiting contrast enhancement of AHE. After pre-processing of image for eliminating noise in the
Contrast amplification of given pixel value is attained by image, features are extracted from those images to identify
transformation slope function. This is proportional to and grade probable cancers.16 Feature extraction is one
neighborhood slopes' cumulative distribution function among the significant steps in examining the CT images.
(CDF) and proportional to histogram value at pixel value. Features may be extracted at cell level or tissue level of
CLAHE restricts amplification by histogram clipping at CT images for enhanced predictions. To better attain the
predefined value before evaluating CDF. This restricts morphological information, this work considers area,
JENA AND GEORGE 5

F I G U R E 1 A, ROI segmented
image of lung using efficient CLAHE for
image enhancement. B, Step by step
process of lung cancer detection.
CLAHE, contrast limited adaptive
histogram equalization; ROI, region of
interest
6 JENA AND GEORGE

brightness, diameter, perimeter, elongation and so on as 4π × Area


β= ð10Þ
features to reflect nuclei irregularity in CT images. Cellular P2
feature level points on quantifying individual cells proper-
ties without measuring spatial dependency among them. 8 Solidity: Solidity is the proportion of actual cell/nucleus
In CT images, shape and morphological features are con- area to convex hull area shown in Equation (11):
sidered for extracted. Based on these characteristics, certain
significant shape and morphological based features are Area
Solidity = ð11Þ
illustrated as follows. Convex area

1 Area (A): The area can be specified by nucleus region 9 Eccentricity: Ratio of major axis length to minor
comprising total amount of pixels; it is shown in axis length is termed as eccentricity as defined in
Equation (5): Equation (12):
Xn X m
A= Bði, jÞ ð5Þ Length of major axis
i=1 j=1 Eccentricity = ð12Þ
Length of minor axis
where, A, nucleus area; B, segmented image of “i”
rows and “j” columns. 10 Compactness: Compactness is ratio of area and square
2 Brightness: The average intensity value of pixels of perimeter. It is formulated as in Equation (13):
belonging to nucleus region is considered as nucleus
brightness. Area
Compactness = ð13Þ
3 Nucleus longest diameter (NLD): Largest circle's perimeter2
diameter circumscribing nucleus region is termed as
nucleus longest diameter; it is given in Equation (6);
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3.6 | Classification
NLD = ðx 1 −x 2 Þ2 + ðy1 −y2 Þ2 ð6Þ
In this investigation, Kernal based non Gaussian CNN
method is used for classification. There are two essential
where, x1, y1 and x2, y2 denotes end points on factors to be considered for classification purpose.
major axis. They are:
4 Nucleus shortest diameter (NSD): This is specified
via smallest circle's diameter circumscribing nucleus 1 Nonlinear process data is provided onto high-
region. It is represented in Equation (7): dimensional linear feature to acquire feature data,18,19
whitening feature data, while selecting feature for
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
classification.
NSD = ðx 2 −x 1 Þ2 + ðy2 −y1 Þ2 ð7Þ
2 Independent component analysis strategy was used over
whitened data to haul out features. After extracting the
where x1, y1 and x2, y2 denotes end points on data, FP strategy has to be reduced using Kernal based
minor axis. non-Gaussian CNN which in cooperates the offline
5 Nucleus elongation: This is signified by ratio of shortest modeling stage.
diameter to longest diameter of nucleus region, shown 1 Initially, accumulate data to work under normal
in Equation (8): conditions, partition process data into validating
data and training data.
NLD 2 Owing to training data, develop kernel based non
Nuclear elongation = ð8Þ
Perimeter Gaussian model to acquire training samples.
3 Fit PDF of every non Gaussian model as in Equa-
6 Nucleus perimeter (P): Length of perimeter of tion (14) and evaluate parameters based on training
nucleus region is given using Equation (9): samples.
pffiffiffiffi  
P = Even count + 2 ðodd countÞ unit ð9Þ Pi ðsÞ = γ i g0 s; σ 2 i,1 + ð1 −γ i Þ
 
 g0 s; σ 2 i,2 ; 1 < i < a; ð14Þ
7 Nucleus roundness (β): Ratio of nucleus area to area of
circle associated to nucleus longest diameter is termed where, g0(σ 2) denotes zero mean Gaussian pdf with
as nucleus compactness, as shown in Equation (10): σ 2; σ 2i,1 and σ 2i,2 denotes two variance.
JENA AND GEORGE 7

4 Compute training samples probabilities of every connected layers. CT images are given to KNG-CNN input
kernel based variance, and determine probability layer. Here, number of filters, filter size and each layer
threshold for every samples. stride are specified in Figure 2. For instance, convolution
5 Set η value as 0.5 initially. layer 1 utilizes 32 filters with 5 × 5 × 3 kernel, resulting in
6 Acquire weight values for training samples by feature map of 256 × 256 × 32 pixels; pooling layer 1 per-
Equation (15); forms subsampling that provides maximum value in 3 × 3
kernel for every two pixels, decreasing matrix size of fea-

η if wi, t > wi, lim ture map to 128 × 128 × 32 pixels. Here, CLAHE is used
wi, t ¼ ð15Þ
1 − η if wi, t < wi, lim for image interpolation. This helps in estimating unknown
pixel points from known data. With this segmentation,
where, 0 < η < 0.5 denotes prespecified value, smaller region of lung nodule is enlarged to carry out
7 Use attained weight values to calculate statistics of further processing. Every CL is followed by a rectified
training data and determine α confidence limits by linear unit.20-23 After 3-CL and 3-PL, there are two fully
kernel based non Gaussian model. connected layers comprises multilayer perceptron. Proba-
8 Owing to the validating data, construct KICA model bilities of cancer types (adenocarcinoma, squamous cell
to attain validated samples. carcinoma and small cell carcinoma) are acquired by soft-
9 Compute probabilities of validating samples. max function.24-26 While training, dropout method is
10 Acquire weight values for samples validation. employed (dropout rate = 50% for full connection layers)
11 With the attained weight values, calculate statistics to eradicate over-fitting. Batch size of CNN is set as 16, it
for data validation. Compare evaluated statistics with is considered as hyper-parameters to tune deep learning.
confidence limits to attain related false alarm rates. If larger size is used, it leads to faster computational speed
12 For every statistic, if corresponding false alarm rate but causes poor generalization. It also guarantees conver-
exceeds confidence range, η value is maximized or gence by global optimization with objective function.
diminished while η ranges from 0 < η < 0.5, and Smaller batch size provides faster convergence with better
go to Step (6); otherwise, modeling is finished. results. Therefore, while performing experimentation,
batch size should be raised steadily through training to
From above procedure, training, validating and reap better optimality.
processing data are normalized with variances and means
of process variables computed by training data.
4 | RESULTS A ND DISCUSSION

3.7 | Architecture The anticipated methodology for identification and recog-


nizing of cancer detection from CT scanned images com-
KNG-CNN architecture utilized for classifying the type of prises of images enhancement using ROI segmentation,
cancer is shown in Figure 2. It comprises of three pooling morphological feature extraction and classification stages.
layers (3-PL), three convolution layers (3-CL) and two fully CLAHE is cast off for improving CT images, as it has

F I G U R E 2 Proposed KNG-CNN architecture model with three convolutional layers, three RLU layers, three MPL layers with a fully
connected layer and a softmax layer. KNG-CNN, kernel based non-Gaussian convolutional neural network [Color figure can be viewed at
wileyonlinelibrary.com]
8 JENA AND GEORGE

capability to highlight ROI in images as trained and training sets. It is used to access how model outcomes are
tested through experimentation. To preserve information generalized for given dataset. In 10-fold cross validation,
from CT images during ROI segmentation process, vari- data is partitioned into 10 equal folds. However, 10 itera-
ous parameters were examined. In CT images, it is essen- tions of validating and training are carried out for itera-
tial to discover nuclei information to make accurate tion, here 9-folds are utilized for training and diverse data
diagnosis and detection based on features selected. From fold is performed for validation. Therefore, 900 data/sam-
the analysis and results, the KNG-CNN were outperforms ples were considered for training purposes and 100 data/
well in contrast to the prevailing techniques (Figure 3). samples were considered for testing purposes. The pro-
During feature extraction process, various clinically posed method was also tested by using kernel based func-
significant morphology based features were extracted tion with 10-fold cross validation techniques. In KNG-
from segmented images. At last, 2828 are sum of images CNN classification model, kernel's parameters and soft
in dataset and 115 are total amount of features to be margin parameter play significant role in classification
extracted as in Figure 3. process; best combination of features was chosen by grid
Here 1400 data/samples were cast off for testing the search with exponentially growing sequences of parame-
classification algorithms. The purpose of selection sam- ters. Each parameter combination selected was checked
ples (1400) is to show that this experimentation is per- by cross validations (10-fold), and parameters with finest
formed in MATLAB environment with 70:30 ratio for cross validation accuracy were selected. The performance
testing and training. Computation with all samples in metrics like sensitivity, specificity and accuracy were
dataset leads to computational complexity and training is computed using fundamental definitions as illustrated
performed in available tools in MATLAB. While choosing below:
random samples, there are some bias error in learning Accuracy: Classification accuracy is defined as num-
assumptions, that is, higher bias may mislead feature ber of samples classified perfectly (ie, TN and TP) and
values and target output. Variance is considered as error evaluated as in Equation (16):
in sensitivity that shows fluctuations in training set. To
TP + TN
overcome this bias error, threshold computation has to be Accuracy = × 100 ð16Þ
N
done as in classification section. 10-fold cross validation
method was utilized to partition data during testing and where “N,” total number of sample images.

F I G U R E 3 The left images shows a sample lung nodule from the slice of a patient. The right image shows a part of our input raw
training lung dataset [Color figure can be viewed at wileyonlinelibrary.com]
JENA AND GEORGE 9

Sensitivity: Sensitivity is defined as proportion of posi- Value ranges from 0 to 1, which signifies worst and
tive samples which are classified correctly. It is computed best classification respectively.
by Equation (17), F-measure: F-measure is harmonic mean of recall and
precision. It is distinct as in Equations (19) to (21):
TP
Sensitivity = ð17Þ
TP + FN TP
Precision = ð19Þ
TP + FP
where, sensitivity value ranges among 0 and 1, which
represents worst and best classification correspondingly. TP
Recall = ð20Þ
Specificity: It is defined as proportion measure of neg- TP + FN
ative samples that are appropriately classified. The sensi-
tivity value is computed by Equation (18), Precision × Recall
F −measure = 2 × ð21Þ
Precision + Recall

TN F-measure value ranges among 0 and 1, where 0 and 1


Specificity = ð18Þ
TN + FP refers worst and best classification respectively (Figure 4).

(A) Accuracy (B) Precision

(C) F-measure (D) Recall

F I G U R E 4 The comparison of proposed method (KNG-CNN) non-Gaussian convolutional neural network: A, Accuracy; B,
precision; C, F-measure; D, Recall of proposed KNG-CNN model [Color figure can be viewed at wileyonlinelibrary.com]
10 JENA AND GEORGE

TABLE 1 Comparison table for accuracy rate of proposed KNG-CNN with the existing methods

Iterations 1 2 3 4 5
KNG-CNN 87.3 86.72 85.94 85.94 80.24
CNN 70.5033 72.5020 68.0946 70.2183 66.2808
ANFIS 68.1284 70.1718 72.9791 72.6339 68.2526
RBFNN 71.1261 70.0309 71.1543 58.3285 55.2894

Abbreviation: KNG-CNN, kernel based non-Gaussian convolutional neural network.

TABLE 2 Comparison table for recall of proposed KNG-CNN with existing methods

Iterations 1 2 3 4 5
KNG-CNN 93.44 95.78 95.77 96.22 97.65
CNN 92.6080 96.3981 92.4182 90.4951 94.8040
ANFIS 88.7926 87.7907 87.2125 89.4476 90.0392
RBFNN 88.3902 88.0547 84.0969 90.6069 95.5547

Abbreviation: KNG-CNN, kernel based non-Gaussian convolutional neural network.

TABLE 3 Comparison table for precision of proposed KNG-CNN with existing methods

Iterations 1 2 3 4 5
KNG-CNN 75.66 78.32 79.88 80.25 80.99
CNN 60.4165 63.0079 61.4314 60.5681 60.8367
ANFIS 56.4966 55.0309 61.7227 55.8731 62.0831
RBFNN 57.8012 51.5135 59.9361 53.7929 50.6863

Abbreviation: KNG-CNN, kernel based non-Gaussian convolutional neural network: A, Accuracy; B, precision; C, F - measure; D, Recall of
proposed KNG-CNN model.

From all above observations, it is concluded that specifically used for feature selection and classification; it
KNG-CNN produces better outcomes in comparison to uses if-then rules for coupling input-output lung cancer
other methods for CT scanned images. The maximum prediction, while in some cases it provides lesser optimal-
values of accuracy attained are 87.3%, 86.72%, 85.94%, ity in cancer prediction and also risk in predicting k-
85.94% and 80.24%, respectively. KNG-CNN classifier is values related to feature selection. RBFF shows significant
also performing better for as discussed above. Learning drawback in considering the lung image resolution and
rate of CNN was varied while in testing to determine opti- also in kernel approximation.
mal rate that it show not provide any premature satura- From all above observations, it is concluded that
tion in CNN. Learning rate is set as 1e − 2 and 1e − 4, KNG-CNN produces better outcomes in comparison to
while decay is set as 1e − 6. Learning rate range has other methods for CT scanned images. The maximum
to be reduced as it leads to over fitting. Table 1 provide values of recall attained are 93.44, 95.78, 95.77, 96.22 and
comparative analysis of proposed framework with other 97.65, respectively. The KNG-CNN classifier is also per-
standard methods such as CNN, Adaptive neuro fuzzy forming better for as discussed above. Table 2 provides
inference system (ANFIS) and Radial basis function neu- comparative analysis of proposed framework with other
ral network (RBFNN). From Table 1, it is identified that standard methods such as CNN, ANFIS and RBFNN.
proposed technique is performing better in comparison to From Table 2, it is identified that anticipated technique is
all other methods. Based on this comparison,27 CNN is a performing better in comparison to all other methods.
deep learning architecture that has been extensively used From all above observations, it is concluded that
for feature extraction and classification; however, KNG-CNN produces better outcomes in comparison to
2D feature extraction has certain drawbacks. ANFIS is other methods for CT scanned images. The maximum
JENA AND GEORGE 11

values of precision attained are 75.66, 78.32, 79.88, 80.25 values of F-measure attained are 84.55, 86.23, 86.66,
and 80.99, respectively. The KNG-CNN classifier is also 87.26 and 89.25, respectively. KNG-CNN classifier is also
performing better for as discussed above. Table 3 provides performing better for as discussed above. Table 4 provides
comparative analysis of proposed framework with other comparative analysis of proposed framework with other
standard methods such as CNN, ANFIS and RBFNN. standard methods such as CNN, ANFIS and RBFNN.
From Table 3, it is noticed that anticipated method is From Table 4, it is observed that anticipated technique is
performing better in comparison to all other methods performing better in comparison to all other methods.
(Figures 5 and 6). From Table 5, it is noted that various deep learning
From all above observations, it is concluded that models has been used for Noduli type classification,
KNG-CNN produces better outcomes in comparison to while the proposed model shows enhanced accuracy of
other methods for CT scanned images. The maximum 87.3% while analyzing >30 mm tumor size. While based

FIGURE 5 Samples of proposed method KNG-CNN. KNG-CNN, kernel based non-Gaussian convolutional neural network
12 JENA AND GEORGE

on References 28-30 various nodule types like GGO, nodules using LIDC-IRDI database. Figure 6 shows ROC
>30 mm, all types of pulmonary nodule are considered. computation of anticipated model.
As well, CNN architecture is used in other approaches
while the proposed model uses KNG-CNN. Previous
investigations consider GGO, other nodule type 5 | CONCLUSION
and attained superlative outcomes, whereas proposed
model uses nodule of >30 mm,to train and achieved This investigation introduces a novel research methodol-
results of 87.3% with regards to accuracy. Here, KNG- ogy known as KNG-CNN for effectual classification of
CNN modeled to separate malignancy from benign lung cancer nodule. At first, ROI extraction is performed
to acquire improved visual image of lung node. In order
to perform this, an enhanced (CLAHE) is performed in
ROC Curve
1 this examination. Result shows that the anticipated
method provides superior results for performing further
0.9
process such as feature extraction, classification and so
0.8 on. Subsequently, classification technique provides more
0.7 flexible way for lung nodule prediction where multiple
0.6
features (here the work considers morphological fea-
Sensitivity

tures alone) present in the environment would lead to


0.5
inaccurate prediction rate. Here, classification accuracy
0.4 is improved by introducing the proposed KNG-CNN.
0.3
This research method can ensure the optimal detection
of lung nodules accurately by effectual classification
0.2
KNG-CNN technique. The entire research method is implemented
CNN
0.1 ANFIS in MATLAB simulation environment from which it is
RBFNN proved that the proposed research technique KNG-CNN
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 can ensure the accurate prediction of lung nodules with
1-Specificity
reduced computational overhead than the existing
F I G U R E 6 ROC curve for various methods with KNG-CNN. research method. In future, the work has been extended
KNG-CNN, kernel based non-Gaussian convolutional neural in big data platform with hybrid datasets for perfor-
network; ROC, receiver operating characteristic [Color figure can mance enhancement by satisfying all the performance
be viewed at wileyonlinelibrary.com] metrics.

TABLE 4 Comparison table for F-measure of the proposed KNG-CNN with existing methods

Iterations 1 2 3 4 5
KNG-CNN 84.55 86.23 86.66 87.26 89.25
CNN 74.8658 75.8775 76.0299 71.7292 70.4756
ANFIS 69.9939 69.1074 75.1764 66.5339 72.5050
RBFNN 64.5835 69.0184 73.2799 63.7913 65.3276

Abbreviation: KNG-CNN, kernel based non-Gaussian convolutional neural network.

Author Malignant Benign Accuracy Noduli Type Architecture


TABLE 5 Deep learning
algorithms applied to LIDC-IDRI
Han et al28 538 622 82.5 GGO CNN
29
Zhao et al 375 398 82.2 All types CNN
30
Song et al 2311 2265 84.2 All types CNN
Proposed model 648 1324 87.3 >30 mm KNG-CNN

Abbreviation: KNG-CNN, kernel based non-Gaussian convolutional neural network.


JENA AND GEORGE 13

ORCID 17. Hamidian S. 3D convolutional neural network for automatic


Selvaraj Thomas George https://orcid.org/0000-0003- detection of lung nodules in chest CT. Proc SPIE Int Soc Opt
0304-495X Eng. 2017;2017:10134. https://doi.org/10.1117/12.2255795.
18. Zhong G, Wang LN, Ling X, Dong J. An overview on data rep-
resentation learning: from traditional feature learning to recent
R EF E RE N C E S deep learning. J Fin Data Sci. 2016;2(4):265-278.
1. Franco MLN. Influence of ROI Pattern on Segmentation in Lung 19. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification
Lesions. Switzerland: Springer International Publishing; 2015. with deep convolutional neural net-works. Adv Neural Inf
2. Das A. A Novel Analysis of Clinical Data and Image Processing Proces Syst. 2012;1097-1105.
Algorithms in Detection of Cervical Cancer. Berlin Heidelberg: 20. Lv Y, Duan Y, Kang W, Li Z, Wang FY. Traffic flow prediction
Springer-Verlag; 2015. with big data: a deep learning approach. IEEE Trans Intell
3. Ma C, Guo X. Effect of region of interest size on ADC measure- Transp Syst. 2015;16(2):865-873.
ments in pancreatic adenocarcinoma. Cancer Imaging. 2017;17: 21. Ioannidou A, Chatzilari E, Nikolopoulos S, Kompatsiaris I.
13. https://doi.org/10.1186/s40644-017-0116-6. Deep learning advances in computer vision with 3D data: a sur-
4. Gutiérrez R. A visual model approach to extract regions of vey. ACM Comput Surv (CSUR). 2017;50(2):20.
interest in microscopical images of basal cell carcinoma. Boi 22. Romero A, Gatta C, Camps-Valls G. Unsupervised deep feature
Med. 2013;8(Suppl 1): S36 extraction for remote sensing image classification. IEEE Trans
5. Karaçali B, Tözeren A. Automated detection of regions of interest Geosci Remote Sens. 2016;54(3):1349-1362.
for tissue microarray experiments: an image texture analysis. 23. Stober S, Sternin A, Owen AM, Grahn JA. Deep feature learn-
BMC Med Imag. 2007;7:2. https://doi.org/10.1186/1471-2342-7-2. ing for EEG recordings. arXiv. 2015;1511.04306.
6. Kwon D, Kim H, Kim J, Suh SC, Kim I, Kim KJ. A survey of 24. Kalinovsky A, Kovalev V. Lung image segmentation using deep
deep learning-based network anomaly detection. Cluster Com- learning methods and convolutional neural networks. 2016;
put. 2017;22:1-13. 21-24.
7. Bhatia N, Rana MC. Deep learning techniques and its various 25. Badrinarayanan V, Kendall A, Cipolla R. Segnet. A deep
algorithms and techniques. Int J Eng Innov Res. 2015;4:707–710. convolutional encoder-decoder architecture for image segmen-
8. Sharma K. Lung Cancer Detection in CT Scans of Patients Using tation. IEEE Trans Pattern Anal Mach Intell. 2017;39(12):2481-
Image Processing and Machine Learning Technique. Singapore: 2495.
Springer Nature Singapore Pte Ltd; 2018. 26. Badrinarayanan V, Handa A, Cipolla R. Segnet: a deep con-
9. Teramoto A. Automated classification of lung cancer types volutional encoder-decoder architecture for robust semantic
from cytological images using deep convolutional neural net- pixel-wise labelling. arXiv preprint. 2015;1505.07293.
works. BioMed Res Int. 2017;2017:4067832. https://doi.org/10. 27. Pehrson LM. Automatic pulmonary nodule detection applying
1155/2017/4067832 6 pages. deep learning or machine learning algorithms to the LIDC-
10. Song QZ, Zhao L, Luo XK, Dou XC. Using deep learning for IDRI database: a systematic review. Diagnostics. 2019;9:2–11.
classification of lung nodules on computed tomography 28. Han G, Liu X, Zheng G, Wang M, Huang S. Automatic recogni-
images. Hindawi J Healthc Eng. 2017;2017:8314740. https://doi. tion of 3D GGO CT imaging signs through the fusion of hybrid
org/10.1155/2017/8314740. resampling and layer-wise fine-tuning CNNs. Med Biol Eng
11. Kuan K. Deep learning for lung cancer detection: tackling the Comput. 2018;56:2201-2212.
Kaggle data science bowl 2017 challenge. IEEE. 2017:1–9. 29. Zhao X, Liu L, Qi S, Teng Y, Li J, Qian W. Agile convolutional
12. AI-Absi HRH. Computer aided diagnosis system based on neural network for pulmonary nodule classification using CT
machine learning techniques for lung cancer. IEEE. 2012: images. Int J Comput Assist Radiol Surg. 2018;13:585-595.
295–300. https://doi.org/10.1109/ICCISci.2012.6297257. 30. Song QZ, Zhao L, Luo XK, Dou XC. Using deep learning for
13. Rao P. Convolutional neural networks for lung cancer screen- classification of lung nodules on computed tomography
ing in computed tomography (CT) scans. IEEE. 2016:489–493. images. J Healthc Eng. 2017;2017:1-7.
14. Li Z. A novel method for lung masses detection and
location based on deep learning. Paper presented at: 2016 Inter-
national SoC Design Conference (ISOCC); November 13-16,
2017; Kansan City, USA.
How to cite this article: Jena SR, George ST.
15. Shimizu R. Deep learning application trial to lung cancer
diagnosis for medical sensor systems. Paper presented at: IEEE;
Morphological feature extraction and KNG-CNN
October 23-26, 2016; Jeju, South Korea. classification of CT images for early lung cancer
16. Alam Miah MB. Detection of lung cancer from CT image using detection. Int J Imaging Syst Technol. 2020;1–13.
image processing and neural network. May; 2015, https://doi. https://doi.org/10.1002/ima.22445
org/10.1109/ICEEICT.2015.7307530.

You might also like