You are on page 1of 26

applied

sciences
Article
Automatic Detection and Staging of Lung Tumors
using Locational Features and
Double-Staged Classifications
May Phu Paing 1 , Kazuhiko Hamamoto 2 , Supan Tungjitkusolmun 1 and
Chuchart Pintavirooj 1, *
1 Faculty of Engineering, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand;
59601030@kmitl.ac.th (M.P.P.); supan.tu@kmitl.ac.th (S.T.)
2 School of Information and Telecommunication Engineering, Tokai University, Tokyo 108-8619, Japan;
hama@keyaki.cc.u-tokai.ac.jp
* Correspondence: chuchart.pi@kmitl.ac.th; Tel.: +66-83-449-7543

Received: 5 May 2019; Accepted: 4 June 2019; Published: 6 June 2019 

Abstract: Lung cancer is a life-threatening disease with the highest morbidity and mortality rates of
any cancer worldwide. Clinical staging of lung cancer can significantly reduce the mortality rate,
because effective treatment options strongly depend on the specific stage of cancer. Unfortunately,
manual staging remains a challenge due to the intensive effort required. This paper presents a
computer-aided diagnosis (CAD) method for detecting and staging lung cancer from computed
tomography (CT) images. This CAD works in three fundamental phases: segmentation, detection,
and staging. In the first phase, lung anatomical structures from the input tomography scans are
segmented using gray-level thresholding. In the second, the tumor nodules inside the lungs are
detected using some extracted features from the segmented tumor candidates. In the last phase,
the clinical stages of the detected tumors are defined by extracting locational features. For accurate
and robust predictions, our CAD applies a double-staged classification: the first is for the detection of
tumors and the second is for staging. In both classification stages, five alternative classifiers, namely
the Decision Tree (DT), K-nearest neighbor (KNN), Support Vector Machine (SVM), Ensemble Tree
(ET), and Back Propagation Neural Network (BPNN), are applied and compared to ensure high
classification performance. The average accuracy levels of 92.8% for detection and 90.6% for staging
are achieved using BPNN. Experimental findings reveal that the proposed CAD method provides
preferable results compared to previous methods; thus, it is applicable as a clinical diagnostic tool for
lung cancer.

Keywords: automatic diagnosis; backpropagation neural network; computed tomography; lung


cancer; staging

1. Introduction
Lung cancer is one of the most common diseases, and it has the highest morbidity and mortality
rates of any cancer worldwide. As reported by global cancer statistics [1], there were 2.1 million new
cases and 1.8 million deaths from lung cancer in 2018. Approximately one in five cases (18.4%) of lung
cancer leads to death. In oncology, medical imaging plays a vital role in reducing the mortality rate.
Imaging is a non-invasive and painless procedure with the least harmful side effects for patients. It can
provide detailed anatomical information about the disease by generating visual representations of
tumors. Unlike other invasive methods such as surgeries and biopsies which extract and study a small
portion of tumor tissue, imaging can give a more comprehensive view and analysis of the entire tumor

Appl. Sci. 2019, 9, 2329; doi:10.3390/app9112329 www.mdpi.com/journal/applsci


Appl. Sci. 2019, 9, 2329 2 of 26

region. Moreover, it is preferable for clinical routines that require an iterative analysis of the tumor
during treatment [2].
Computed tomography (CT) is a standard imaging modality for lung cancer diagnosis.
The National Lung Screening Trial (NLST) reported that the lung cancer mortality rate of 15–20% could
be reduced by performing low-dose CT screening [3]. CTs display tumors on cross-sectional images
of the body called “slices”. These slices can be stacked together to reconstruct a three-dimensional
structure which makes possible more comprehensive analysis of tumors. However, the manual
interpretation of the CT scans is prohibitively expensive in terms of time and effort. Also, it is subject to
the skill and clinical practices of the interpreter; hence, the diagnosis results may suffer from inter- and
intra-observer variation [4]. Due to these inconveniences, any real-world application is necessary for
automated interpretations of the CT scans. This fact has strongly motivated computer-aided diagnosis
systems (CADs) to become an extensive research area in the field of biomedical engineering. Scientists
have proposed a vast number of CADs for lung cancer diagnosis using modern image processing and
machine learning techniques. Nonetheless, most of these CADs have focused only on the detection
of pulmonary nodules. Nodules can evoke the possibility of lung cancer and appear as round or
oval-shaped opacities on the CT scans [5]. Nonetheless, the detection of nodules does not provide
sufficient information about the severity of the disease to make proper treatment decisions. Generally,
half of the patients who undergo CT screening present more than one nodule, but not all of these
nodules are cancerous [6]. Physicians usually use the terms “nodule” and “tumor” interchangeably
when they are uncertain about the severity of the disease. Indeed, identifying the severity of a tumor
nodule is a principal issue for lung cancer diagnosis.
In clinical practice, the severity of lung cancer can be expressed as different stages using the Tumor,
Node, and Metastasis (TNM) system [7]. As the name describes, this system assesses the severity
of the disease based on three descriptors. The first descriptor (T) assesses the characteristics of the
primary tumor such as its size, local invasion, and the presence of satellite tumor nodules. The CT
modality performs T-staging well, because information about the tumor can be easily obtained from
the CT scans. The second descriptor (N) assesses the involvement of the regional lymph nodes. On the
CT scans, lymph nodes appear as small opacities with unclear silhouettes; thus, the CT modality
performs weakly for N-staging [8,9]. The last descriptor (M) assesses the invasion of the tumor to the
extra-thoracic organs, especially the brain, adrenal glands, liver and bones, and so forth [9]. Staging
of metastatic tumors usually requires additional examinations of the invaded body regions. For this
reason, we focus only on the T-staging of tumor nodules in this study. T-stages can be further divided
into sub-stages depending on the anatomic characteristics of the primary tumors. Table 1 describes the
detailed definitions of the T-stages, as determined by the eighth edition of the TNM system.

Table 1. T-staging of lung tumors.

Sizes (in Greatest


Stages Invasion
Dimension)
Tx Tumor cannot be assessed
T0 No evidence of primary tumor
Tis Carcinoma in situ (pre-cancer)
T1 ≤3 cm
Surrounded by lung or visceral pleura. Invades the lobar
T1a ≤2 cm
bronchus but not the main bronchus.
T1b >2 cm and ≤3 cm
T2 >3 cm and ≤5 cm Invades the main bronchus, regardless of distance from the
T2a >3 cm and ≤4 cm carina but does not invade the carina. Invades the visceral
T2b >4 cm and ≤5 cm pleura. Invades the hilar region.
Invades the chest wall, phrenic nerve, and parietal
T3 >5 cm and ≤7 cm
pericardium. Presents satellite nodules in the same lobe.
Invades the diaphragm, mediastinum, heart, great vessels,
T4 >7 cm
and trachea. Presents satellite nodules in the different lobes.
Appl. Sci. 2019, 9, 2329 3 of 26

2. Related Works
The remaining problem of existing lung cancer diagnosis systems is that they are capable of
detecting tumor nodules only. As there are numerous nodule-like lesions in the lungs, it is still
challenging to detect real tumor nodules. Most current CADs thus try to reduce the number of false
positives, which can hinder the detection of real nodules. This means that current CADs defer staging
assessments. Even though there have been a lot of studies on the computerized detection of tumor
nodules, we found a paucity of literature on staging. In 2014, Kulkarni and Panditrao [10] proposed
an automatic tumor staging system from chest CT scans using marker-controlled watershed and
Support Vector Machine (SVM). As a pre-processing method, they smoothed the input CT images with
a Gabor filter, and then possible tumor nodules from the smoothed CT slices were segmented using
marker-controlled watershed. Next, three geometric features—area, perimeter and eccentricity—were
extracted and fed into the Support Vector Machine (SVM) for classification. Their method produced
diagnosis results in four stages (T1, T2, T3, and T4), and they evaluated 40 CT scans from the National
Institute of Health/ National Cancer Institute (NIH/NCI) Lung Image Database Consortium (LIDC)
dataset. However, the evaluation assessments were not described in detail.
Similarly, Ignatious et al. (2015) [11] applied marker-controlled watershed for lung tumor detection
and staging. Unlike [10], they used the sharpening method to enhance the input CT slices and then five
features—area, perimeter, eccentricity, convex area and mean intensity—were extracted. They selected
the true tumors based on an empirical threshold value of area. After that, staging was conducted by
four alternative classifiers: Support Vector Machine (SVM), Naive Bayes Multinomial classifier (NBM),
Naive Bayes Tree (NB tree), and Random Tree. Their experiment was validated using 200 CT slices
from Regional Cancer Centre Trivandrum, and an accuracy level of 94.4% was obtained by the Random
Tree classifier. Nevertheless, tumor segmentation based on the marker-controlled watershed is hard to
fully automate, and it has a low guarantee in detecting any shape and size of nodules, because creating
internal and external markers for affine variant nodules is an arduous task.
Besides the aforementioned CADs, a multi-stage cancer detection method using watershed and
multi-class SVM classifier was also proposed by Alam et al. (2018) [12]. In their study, the median
filter was used for the pre-processing of 500 lung CT images, and then thirteen different image features
were extracted from the region of interest (ROI). In the first stage of classification, they try to detect any
cancerous tumor nodules from the input CT exam. If there were cancerous tumors, they identified
the clinical stages of those tumors in the second stage of classification. Otherwise, they calculated
the probability of non-cancerous tumors becoming cancer in the third stage. For the first and second
stages, they used the SVM classifier, and for the third stage, they calculated the area ratio (area of the
nodule to the area of the entire lung). Their experiment reached accuracy levels of 97% for detection
and 87% for staging, respectively. Our proposed CAD which applies two stages of classification
for detection and staging is quite similar to that presented in [12]. However, we differ from them
by extracting the locational features of cancerous tumors and using alternative machine learning
algorithms for classification. Moreover, we determine the clinical stages of tumors using the eighth
edition of the TNM staging method, which is a global, up-to-date standard for cancer staging. In their
approach, they classified the stages (initial, middle, and final) roughly; thus, it was not very beneficial
for precise treatment.
Recently, deep learning models have made great breakthroughs for machine learning [13].
Convolutional Neural Networks (CNNs) are the most frequently used deep learning model for the
automated detection of lung tumors. However, for staging, CNN has been applied in very few
studies. Kirienko et al. (2018) [14] developed a CNN for T1–T2 or T3–T4 staging of lung tumors.
Their staging referenced the seventh edition of the TNM system and tested with 472 FDG-PET/CT
scans. Their proposed CNN was constructed using 2D patches that were cropped around the center of
the tumor. Additionally, their CNN used two neural networks: (i) one as a feature extractor to extract
the most relevant features from a single patch, and (ii) another as a classifier to perform classifications
for all slices. They reported a high testing accuracy of 90%, but the outputs were presented as binary
Appl. Sci. 2019, 9, 2329 4 of 26

numbers (T1–T2 with label = 0 and T3–T4 with label = 1), that is, their method cannot give an accurate
result for each specific stage. Moreover, their approach is not fully automated, because their CNN
requires manually cropped patches as inputs.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 4 of 26
Another tumor staging method using a double convolutional deep neural network (CDNN) was
presented in [15].
result for eachThey usedstage.
specific 73 CT exams from
Moreover, their the Imageis and
approach Dataautomated,
not fully Archive ofbecause
the University
their CNNof South
requires manually cropped patches as inputs.
Carolina and the Laboratory of Neuro Imaging (LONI) dataset. They divided the input exams into two
piles: pile 1 Another
(with cancer)tumor staging
and pile method using acancer).
2 (without double convolutional deepinneural
Then, all slices both network
piles were (CDNN)
furtherwasgrouped
presented in [15]. They used 73 CT exams from the Image and Data Archive of the University of South
depending on the angles using the K-means algorithm. Subsequently, they built a double CDNN
Carolina and the Laboratory of Neuro Imaging (LONI) dataset. They divided the input exams into
containing
two double
piles: pile convolution
1 (with cancer) layers and2an
and pile additional
(without max
cancer). pooling
Then, layer.
all slices They
in both reported
piles a prediction
were further
accuracy of 99.62%. Nonetheless, their CDNN output only decimal values
grouped depending on the angles using the K-means algorithm. Subsequently, they built a double between 0 (non-cancer) and
1 (cancer);
CDNN hence, a threshold
containing doublevalue was necessary
convolution layers and to
an determine
additional maxthepooling
possibility
layer.of tumor
They stages.
reported a
Theprediction
primaryaccuracyobjectiveof 99.62%.
of thisNonetheless,
paper is to their CDNN
propose outputthat
a CAD only is
decimal
capable values between
of not only0the(non- detection
cancer) and 1 (cancer); hence, a threshold value was necessary to determine the possibility of tumor
of tumor nodules, but also of their staging. We present two significant contributions in our work;
stages.
the first is performing double-staged classification, and the second is the extraction of locational
The primary objective of this paper is to propose a CAD that is capable of not only the detection
features. In the first contribution,
of tumor nodules, but also of their westaging.
perform Wetwo stages
present two of classifications:
significant one in
contributions forourthe detection
work; the task
and another for the staging
first is performing task. Performing
double-staged classification, double-staged classification
and the second is the extraction of helps to minimize
locational features. false
In and
positives the first
makes contribution,
the CADwemore perform two stages
accurate of classifications:
and robust. one forcontribution,
In the second the detection task we and
extract the
another
locational featuresfor thetostaging
identifytask.
thePerforming
stages ofdouble-staged
the lung tumorclassification
becausehelps to minimize
the severity false positives
of lung cancer strongly
and makes the CAD more accurate and robust. In the second contribution, we extract the locational
depends on the sizes and locations of the tumor nodules. Moreover, knowing the exact locations of
features to identify the stages of the lung tumor because the severity of lung cancer strongly depends
tumor nodules
on the sizes is implicitly
and locations beneficial
of the tumorfor nodules.
the surgical planning
Moreover, knowing of the
higher
exactstaged
locations tumors.
of tumorThe rest
of this paper
nodulesisisorganized as follow.
implicitly beneficial Section
for the 3 describes
surgical planning ofthe details
higher stagedof the methods
tumors. The restapplied
of this in this
study. Section
paper is4organized
presents as thefollow.
experimental
Section 3 results and
describes theperformance evaluation
details of the methods of the
applied inproposed
this study.method.
Finally,Section
Section4 5presents
presents thetheexperimental
discussion results and performance
and conclusions evaluation
of the paper. of the proposed method.
Finally, Section 5 presents the discussion and conclusions of the paper.
3. Methods
3. Methods
This section presents the details of the procedures and algorithms applied in our proposed CAD.
This section presents the details of the procedures and algorithms applied in our proposed CAD.
Figure 1Figure
outlines the typical
1 outlines architecture
the typical architectureofofthe
the proposed CAD
proposed CAD which
which operates
operates in fundamental
in three three fundamental
phases:phases:
(i) segmentation of the
(i) segmentation lung
of the anatomical
lung structures,
anatomical structures, (ii)(ii) detection
detection of tumor
of true true tumor nodules,
nodules, and and
(iii) staging
(iii) staging of tumorof tumor nodules.
nodules.

Figure 1. Typical architecture of the proposed computer-aided diagnosis system (CAD).


Figure 1. Typical architecture of the proposed computer-aided diagnosis system (CAD).
3.1. Segmentation of the Lung Anatomical Structures
Segmentation of the lung anatomical structures from the input CT scans needs to be performed as
an initial task. This helps detection faster and easier by eliminating unwanted non-body areas from the
input image, for examples, exam sheets, tables, and so on. Moreover, it is crucial for the extraction
of information in the subsequent analysis of tumor stages. For accurate segmentation of the lung
structures, background anatomy knowledge is indispensable. Hence, before segmentation, we briefly
study the anatomical structures of the lungs and their appearance on the CT scan (Section 3.1.1).
Based on this anatomical knowledge, the segmentation is conducted using two basic algorithms:
Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 26

3.1. Segmentation of the Lung Anatomical Structures


Appl. Sci. 2019, 9, 2329 5 of 26
Segmentation of the lung anatomical structures from the input CT scans needs to be performed
as an initial task. This helps detection faster and easier by eliminating unwanted non-body areas from
the input
gray-level image, for(Section
thresholding examples,3.1.2)
examandsheets, tables, and
isotropic so on. Moreover,
interpolation it is crucial
(Section 3.1.3).forHowever,
the extraction
depending
of information in the subsequent analysis of tumor stages. For accurate segmentation of the lung
on the structure that we want to segment, some methods may be different. Thus, the details of the
structures, background anatomy knowledge is indispensable. Hence, before segmentation, we briefly
segmentation methods
study the for each
anatomical lung structure
structures of the lungsareand
described in Sections
their appearance on 3.1.4–3.1.9.
the CT scan Regarding the proper
(Section 3.1.1).
localization of lung tumors relative to the surrounding anatomical structures,
Based on this anatomical knowledge, the segmentation is conducted using two basic algorithms:these segmentations will
make the locational
gray-level feature extraction
thresholding process
(Section 3.1.2) andmore convenient.
isotropic interpolation (Section 3.1.3). However,
depending on the structure that we want to segment, some methods may be different. Thus, the
3.1.1. Lung Anatomical
details Structures
of the segmentation methods for each lung structure are described in Sections (3.1.4 to 3.1.9).
Regarding the proper localization of lung tumors relative to the surrounding anatomical structures,
Asthese
depicted in the schematic
segmentations will make diagram of lung
the locational structures
feature extractionin Figure
process more2a, convenient.
there are two major organs:
the left and the right lung. Each is enclosed by a pleural sac with two membranes called visceral pleura
3.1.1. Lung Anatomical
(inner membrane highlighted Structures
by the black border) and parietal pleura (outer membrane highlighted by
the blue border). Theseinmembranes
As depicted the schematicprotect
diagram andof attach the lungs
lung structures in to the thoracic
Figure 2(a), therecavity.
are twoAmajor
double-fold
organs: the left and the right lung. Each is enclosed by a pleural sac
of visceral pleura called the lung fissure divides the lungs into separate lobes. The left lung is with two membranes called
visceral pleura
approximately (inner membrane
10% smaller than the righthighlighted
due tobyheartthe black
spaceborder)
in the and
thoraxparietal
[16]. pleura
Thus, (outer
the left lung
membrane highlighted by the blue border). These membranes protect and attach the lungs to the
has only two lobes (upper and lower) which are separated by an oblique fissure, while the right lung
thoracic cavity. A double-fold of visceral pleura called the lung fissure divides the lungs into separate
has three lobes
lobes. The(upper,
left lung middle and lower)
is approximately whichthan
10% smaller arethe
separated
right due by oblique
to heart spaceand horizontal
in the thorax [16].fissures.
As the primary function of the lungs is to carry out gas exchange, 80–90%
Thus, the left lung has only two lobes (upper and lower) which are separated by an oblique fissure, of their area is filled with
low density
while air having
the right lung−1000 Hounsfield
has three Unit
lobes (upper, (HU)and
middle [17]. Thewhich
lower) windpipe, also known
are separated as the
by oblique and trachea
(denoted horizontal
by green fissures.
color),As the primary
carries the airfunction
from theof the lungsto
throat is the
to carry
lungsoutand
gas exchange,
enters each 80–90%
lungofby their
branching
into two area is filled
main with low
bronchi density
(left air having
indicated –1000 Hounsfield
in brown and rightUnit (HU) [17].inThe
indicated windpipe,
dark blue) atalsothe
known
anatomical
as the trachea (denoted by green color), carries the air from the throat to the lungs and enters each
point called the carina. The right main bronchus further splits into three lobar bronchi which enter
lung by branching into two main bronchi (left indicated in brown and right indicated in dark blue)
three lobes
at theofanatomical
the rightpoint lung.called
Likewise, the The
the carina. left right
bronchus splits into
main bronchus two splits
further lobarinto
bronchi
three for
lobareach left
lung lobe. These bronchi undergo multiple divisions, forming the bronchial
bronchi which enter three lobes of the right lung. Likewise, the left bronchus splits into two lobar airway tree. Similar to
the bronchial
bronchiairway
for eachsystem,
left lungpulmonary vessels undergo
lobe. These bronchi also branch throughout
multiple divisions,the lungsthe
forming inbronchial
a tree structure.
airway tree.
The pulmonary Similar to
vascular thecarries
tree bronchial outairway system, pulmonary
the transportation vessels also branch
of deoxygenated throughout
blood from the theheart to
lungs in a tree structure. The pulmonary vascular tree carries out the transportation of deoxygenated
the lungs to perform oxygenation, whereas the bronchial airway tree supports the flow of oxygen to
blood from the heart to the lungs to perform oxygenation, whereas the bronchial airway tree supports
the lungthestructures, including the pulmonary vessels [18]. Both of these trees enter and exit the lungs
flow of oxygen to the lung structures, including the pulmonary vessels [18]. Both of these trees
throughenter
a place
and exit the the
called lungs hilum
through(denoted by the
a place called thegreen
hilumdotted
(denoted oval).
by the green dotted oval).

(a) (b)

Figure
Figure 2. 2. Anatomical
Anatomical structure
structure of the
of the humanlungs:
human lungs: (a)
(a)schematic
schematicdiagram of lung
diagram of structures; (b) lung(b) lung
lung structures;
structures on the CT
structures on the CT scan. scan.

Figure 2b illustrates the lung anatomical structures on the CT scan. As we can see on the scan,
there are two distinct components: the air (the black regions) and the human body (the gray regions) [17].
These two components have different radiography densities in which the air (−1000 HU) has a much
lower density than the human body (−400 HU to 1000 HU). The human body consists of structures such
as parenchyma or lung regions, fluid, fat, soft tissues, bones, and vessels. The radiological densities of
these structures are different from each other showing different gray levels on the CT scan. Figure 3
demonstrates the range of radiological densities (HU values) of each lung structure.
Figure 2(b) illustrates the lung anatomical structures on the CT scan. As we can see on the scan,
there are two distinct components: the air (the black regions) and the human body (the gray regions)
[17]. These two components have different radiography densities in which the air (–1000 HU) has a
much lower density than the human body (–400 HU to 1000 HU). The human body consists of
structures such as parenchyma or lung regions, fluid, fat, soft tissues, bones, and vessels. The
Appl. Sci. 2019, 9, 2329 6 of 26
radiological densities of these structures are different from each other showing different gray levels
on the CT scan. Figure 3 demonstrates the range of radiological densities (HU values) of each lung
structure.
3.1.2. Gray-Level Thresholding
Several researchers have proposed different segmentation methods for lung anatomical structures.
3.1.2. Gray-level Thresholding
We applied gray-level thresholding due to its quick and straightforward procedure. To segment the
region Several researchers
of interest have proposed
(ROI), thresholding different
replaces an imagesegmentation methods
pixel with a black one iffor lung
it has anatomical
a gray level of
structures.
less We appliedthreshold
than a predefined gray-level thresholding due to its quick and straightforward procedure. To
value.
segment the region of interest (ROI),  thresholding
 replaces an image pixel with a black one if it has a
Px,y < T,value.
If g threshold
gray level of less than a predefined then Px,y = 0, else1, (1)
  If 𝑔 𝑃 , 𝑇, then 𝑃 , = 0, else 1, (1)
where g Px,y is the gray value of the image pixel Px,y and T is the threshold gray value.
whereAs 𝑔(𝑃 , ) is the gray
we discussed, valueanatomical
the lung of the image pixel 𝑃are
structures , and 𝑇 is the
allocated in athreshold
particulargray value. range on a
gray-level
As we
CT scan. discussed,
They appear in the lung anatomical
a range between black structures are allocated
and white; in athe
for instance, particular gray-level
air appears black, range on
the fluid
a CT scan.
appears They
gray, softappear
tissues in a range
appear between
various shadesblack and bones
of gray, white;appear
for instance, the air and
dense white, appears black,
vessels the
appear
fluid appears
white. However, gray, softpixel
every tissues appear
of each various
specific shades
region of gray,an
represents bones appear gray
individual dense white,
value and
that vessels
is related
appear
to white.
the mean However, every
radiographic densitypixel of lung
of the each anatomical
specific region represents
structures [19]. an
Theindividual gray
relationship value that
between the
is related
gray valuetoofthe mean
a pixel andradiographic
its associateddensity
density of can
the be
lung anatomical
expressed by structures [19]. The relationship
between the gray value of a pixel and its associated density can be expressed by
  HU − b
g P𝑔x,y𝑃 = = ,, (2)
(2)
, m
where HU
where 𝐻𝑈 isisthe
theradiographic density,b 𝑏
radiographicdensity, is the
is the rescale
rescale intercept,
intercept, and m is𝑚the
and is the rescale
rescale slope
slope of CT
of the the
CT image.
image. Using
Using this this equation,
equation, we convert
we convert the values
the HU HU values
of theoflung
the lung structures
structures into values
into gray gray values
to set to
a
set a proper
proper threshold.
threshold.

Figure 3. The range of radiological density of the lung structures.


Figure 3. The range of radiological density of the lung structures.
3.1.3. Isotropic Interpolation
3.1.3. Isotropic Interpolation
Lung CT scans are two-dimensional (2D) cross-sectional images taken from different angles
around Lung CT scans
the body. These areslices
two-dimensional
can be stacked(2D) cross-sectional
together imagesinto
and reconstructed taken from different angles
a three-dimensional (3D)
isotropic image for more comprehensive analysis and visualization. Assume that aVthree-dimensional
around the body. These slices can be stacked together and reconstructed into (x, y, z) is 3D data
(3D) isotropic image forthe
more comprehensive
Nx × Nanalysis and visualization. Assume that 𝑉(𝑥, 𝑦, 𝑧) is
of stacked CT slices with dimensions y × Nz (width × height × thickness) where Nx = N y
3D data of stacked CT slices with the dimensions 𝑁
(generally it is equal to 512 which is the typical resolution of a𝑁2D CT𝑁slice) (width
and N×z height × thickness)
is the total number
where 𝑁 = 𝑁
of slices in a single exam. Here, the term “single exam” means all of the sequential CT slices for aand
(generally it is equal to 512 which is the typical resolution of a 2D CT slice) 𝑁
single
is the total number of slices in a single exam. Here, the term “single exam” means all of the
patient during one CT scan. Then, the position of a voxel in V can be expressed by V xp , yq , zk where: sequential

n oNx −1 n oN y −1
xp , yq , and {zk }kN=z −1
0
. (3)
p=0 q=0
Appl. Sci. 2019, 9, 2329 7 of 26

For the CT scans, we can assume that xp = p∆ x, yq = q∆ y, and zk = k∆ z where ∆ denotes the
intervals between pixels along each coordinate. In a situation with inevitable noise, the positions of the
voxel are defined as      
g xp , yq , zk = V xp , yq , zk + n xp , yq , zk (4)
 
where n xp , yq , zk represents the additive noise. After stacking the 2D slices, linear interpolation has to
be carried out across the z-axis, because the thickness of the CT slices may vary even in a single exam:
! !
z − zk z − zk
glin (z) = 1 − g(zk ) + g(zk+1 ), zk ≤ z ≤ zk + 1 (5)
zk+1 − zk zk+1 − zk

where z is the new slice to be interpolated, and zk and zk+1 are two adjacent slices of g(x, y, z). Then,
glin (z) is the resulting image after linear interpolation.
Some of the previous CAD methods reconstructed the volumetric structure CT slices at the
beginning stage and then detection was performed on the reconstructed volume. Although working
on the isotropic image provides comprehensive information about the tumor nodules, it demands
high computational effort and time. Conversely, working on 2D data demands less effort but
provides insufficient information about the tumor nodules. As an expedient method, we performed
segmentation slice-by-slice on 2D data; after that, we reconstructed an isotropic image of 2D segments
for further analysis.

3.1.4. Lung Region Segmentation


Lung regions constitute a large proportion of air with a radiographic density ranging from
−600 HU to −400 HU on the CT scans. In keeping with the histogram of HU values in Figure 3, the CT
value −400 HU separates the air-filled regions from the human tissues well. Thus, we chose −400 HU
as the threshold density to segment the lung regions. The corresponding gray value of the selected
density HU can be converted using Equation (2). With this threshold gray value, we performed
segmentation by applying gray level thresholding.
As an illustration, Figure 4a shows a 2D image of lung CT scan where black portions represent
air-filled regions and gray portions represent the human tissues. After thresholding using −400 HU,
the body region can be eliminated by substituting it with black pixels, as illustrated in Figure 4b.
However, we obtained two separate air-filled regions (white pixels): one outside of the eliminated
body and another inside the body region. Our region of interest, the lungs, comprises air-filled regions
inside the body. Thus, we suppressed all the white pixels outside of the body, as shown in Figure 4c.
After the suppression, we only had air-filled regions inside the body and these are the lung regions.
Subsequently, the resulting lung regions were morphologically enclosed, as shown in Figure 4d.
The aim of enclosing was to prevent under-segmentation of lesions inside the lungs, especially the
tumor nodules which are adhered to the lung boundary. Moreover, we performed separation of the
left and right lungs because they may touch each other in some CT slices. Morphological erosion can
help to separate the two lungs, as shown in Figure 4e. For all sequential CT slices in a single exam, all
the procedures from (a) to (e) were conducted slice by slice. After that, all resulting 2D lung segments
were linearly interpolated to generate an isotropic image, as depicted in Figure 4f.
Appl. Sci. 2019, 9, 2329 8 of 26
Appl. Sci. 2019, 9, x FOR PEER REVIEW 8 of 26

(a) (b) (c)

(d) (e) (f)


Figure 4.4. Segmentation
Segmentationofof thethe
lunglung regions:
regions: (a) input
(a) input 2D computed
2D computed tomography
tomography (CT)
(CT) slice, slice, (b)
(b) segments
of air-filledofregions,
segments (c)regions,
air-filled suppressed air-filled regions
(c) suppressed outside
air-filled of the
regions body,of
outside (d)the
connection
body, (d)between the
connection
two lungs, (e) separation of the left and right lungs using morphological erosion, (f)
between the two lungs, (e) separation of the left and right lungs using morphological erosion, (f) isotropic structure
of the segmented
isotropic structurelung regions.
of the segmented lung regions.

3.1.5. Trachea
3.1.5. Trachea andand Bronchial
Bronchial Tree Tree Segmentation
Segmentation
Following the
Following the segmentation
segmentation of of the
the lung
lung regions,
regions, the the next
next step
step waswas to to segment
segment the the trachea
trachea andand
bronchial tree. These two organs carry out the distribution of oxygen in the lungs andair
bronchial tree. These two organs carry out the distribution of oxygen in the lungs and show density
show air
on the CT scans, as depicted in Figure 5a. According to the lung anatomy,
density on the CT scans, as depicted in Figure 5(a). According to the lung anatomy, the trachea is a the trachea is a tube-like
air-filled structure
tube-like which is located
air-filled structure which isbetween
locatedthe two lungs,
between the two andlungs,
it entersandeach lung by
it enters each branching
lung by
two main bronchi. Based on this fact, the lesions with air density
branching two main bronchi. Based on this fact, the lesions with air density were roughly segmentedwere roughly segmented using
the threshold
using the threshold of −900ofHU
densitydensity –900[20],
HU as [20],
illustrated in Figurein5b.
as illustrated FigureFrom5(b).
these air-filled
From thesesegments,
air-filled
segments, a single lesion which locates the middle of the two lungs was segregated as theastrachea,
a single lesion which locates the middle of the two lungs was segregated as the trachea, shown as in
Figure 5c. Similar to the previous step, trachea segmentation was also executed
shown in Figure 5(c). Similar to the previous step, trachea segmentation was also executed slice-by- slice-by-slice, and then
the isotropic
slice, and then image was reconstructed.
the isotropic image was After the isotropic
reconstructed. Afterreconstruction, the tube-like structure
the isotropic reconstruction, of the
the tube-like
segmented trachea was visualized, as in Figure 5c.
structure of the segmented trachea was visualized, as in Figure 5(c).
After tracheal
After tracheal segmentation, segmentation of
segmentation, segmentation of the
the bronchial
bronchial airways
airways was was attempted
attempted usingusing 3D 3D
region growing. Using the center voxel of the segmented trachea as a
region growing. Using the center voxel of the segmented trachea as a seed point, region growing was seed point, region growing was
iteratively performed
iteratively performed based based on on different
different threshold
threshold values. Similar to
values. Similar to [20],
[20], we
we set
set the
the initial
initial threshold
threshold
value
value as −900 HU
as –900 HU and
and increased
increased this
this byby 64
64 HUHU in inevery
everyiteration
iteration totofind
findthe theairway
airwayregions.
regions. However,
However,
region growing may suffer from leakage of segmentation in some
region growing may suffer from leakage of segmentation in some cases as the bronchial airways cases as the bronchial airwaysare
are spread throughout the lungs. For instance, Figure 5e depicts
spread throughout the lungs. For instance, Figure 5(e) depicts the leakage of bronchial airway the leakage of bronchial airway
segmentation into
segmentation into the
the parenchyma
parenchyma (lung (lung regions).
regions). For For such
such cases,
cases, the
the threshold
threshold value
value needed
needed to to be
be
adjusted in each iteration to prevent leakage. If the total volume of the
adjusted in each iteration to prevent leakage. If the total volume of the segmented airway tree in the segmented airway tree in the
current iteration
current iteration was
was atat least
least twice
twice that
that ofof the
theprevious
previousiteration,
iteration,region
regiongrowthgrowth leakage
leakagewas wasobserved.
observed.
Thus,
Thus, the threshold value was reduced by half to stop the leakage. In this way, the trachea and
the threshold value was reduced by half to stop the leakage. In this way, the trachea and the
the
bronchial tree were successfully segmented, as shown
bronchial tree were successfully segmented, as shown in Figure 5(f). in Figure 5f.
Appl. Sci. 2019, 9, 2329 9 of 26
Appl.Sci.
Appl. Sci.2019,
2019,9,
9,xxFOR
FORPEER
PEERREVIEW
REVIEW 99of
of26
26

(a)
(a) (b)
(b) (c)
(c)

(d)
(d) (e)
(e) (f)
(f)
Figure5.
Figure 5.Segmentation
Segmentation
Segmentationof of the
ofthe trachea
thetrachea and
tracheaandand bronchial
bronchial
bronchial airway:
airway:
airway: (a)(a)
(a) input
input
input 2D2D
2D CTCT
CT slice, (b)segments
slice,
slice, (b) segments
(b) segmentsofair-
of air-
of
filledregions,
filled regions,
air-filled (c)a(c)
regions,
(c) asegment
segment
a segment ofthe
of the trachea,
of the trachea,
trachea, (d)(d)
(d) isotropic structure
isotropic
isotropic structure
structure ofsegmented
of segmented
of segmented trachea,
trachea,
trachea, (e)(e)
(e) leakage
leakage
leakage of
of
region
of regiongrowing
growing into
intothe
theparenchyma,
parenchyma, (f)
(f) isotropic
isotropic structure
structure of
of successfully
region growing into the parenchyma, (f) isotropic structure of successfully segmented trachea segmented trachea
trachea and
and
bronchialtree.
bronchial tree.

3.1.6. Pulmonary
3.1.6. Pulmonary Vessel
VesselSegmentation
Pulmonary Vessel Segmentation
3.1.6. Segmentation
Pulmonary vessels
Pulmonary vessels have
have aaa complex
vessels have complex structure
structure with
with several
several branches
branches spanning
spanning throughout
throughout thethe
Pulmonary complex structure with several branches spanning throughout the
lung surfaces.
lung surfaces. They
surfaces. They are
They are elongated,
are elongated, tubular,
elongated, tubular, and
tubular, and appear
and appear bright
appear bright on
bright on the
on the CT
the CT scans
CT scans [21]
scans [21] as
[21] as can
as can be
can be seen
be seen in
seen inin
lung
Figure
Figure 6a. Generally,
6(a). Generally, they have
they havea soft-tissue
a density
soft-tissue of
densityapproximately
of approximately 40 HU
40 [22].
HU However,
[22]. we
However, did
we not
did
Figure 6(a). Generally, they have a soft-tissue density of approximately 40 HU [22]. However, we did
use
not this
usedensity value as
this density
density a threshold
value because vessels
as aa threshold
threshold becauseare usually
vessels areinusually
contact with
usually adjacent
in contact
contact pulmonary
with adjacent
not use this value as because vessels are in with adjacent
structures
pulmonary such as airways
structures such and
as nodules.
airways and nodules.
pulmonary structures such as airways and nodules.

(a)
(a) (b)
(b) (c)
(c)

(d)
(d) (e)
(e)
Figure 6.
Figure
Figure 6.Segmentation
6. Segmentation of
Segmentation ofthe
of thepulmonary
the pulmonary vessels:
pulmonary vessels: (a)
vessels: (a) input
input 2D
2D CT
CT slice,
slice, (b)
(b) segments
segments of
of lung
lung regions;
regions;
(c)
(c) segments
(c) segments of
segments of lung
of lunginternal
lung internal structures,
internal structures, (d)
structures, (d)enhanced
(d) enhancedvessels
enhanced vesselsusing
vessels usingmultiscale
using multiscale vessel
multiscale vessel enhancement
vessel enhancement
enhancement
filtering,
filtering, (e)
(e) isotropic
isotropicstructure
structure of
ofthe
thesegmented
segmented pulmonary
pulmonary
filtering, (e) isotropic structure of the segmented pulmonary vessels. vessels.
vessels.
Appl. Sci. 2019, 9, 2329 10 of 26

As a prior process to vessel segmentation, we extract all of the lesions inside the segmented
lungsAppl.
bySci.
applying threshold
2019, 9, x FOR density of −800 HU, as illustrated in Figure 6c. Among these 10
PEER REVIEW segmented
of 26
lesions, we extracted the vessels using multiscale vessel enhancement filtering [23]. This enhanced and
As a prior
discriminated process to
the vessels vessel
from thesegmentation,
interference we extract
lesions all ofon
based thethe
lesions inside the
eigenvalue segmented
analysis lungs
of the Hessian
by applying
metric. As in thethreshold
previousdensity
steps, of –800 HU,
vessels fromas2D
illustrated
CT slices in were
Figuresegmented
6(c). Among these segmented
slice-by-slice, and then
lesions, we extracted the vessels using multiscale vessel enhancement filtering [23]. This enhanced
isotropic data was generated, as illustrated in Figure 6e.
and discriminated the vessels from the interference lesions based on the eigenvalue analysis of the
3.1.7.Hessian
Fissuremetric. As inand
Detection the Lung
previous steps,
Lobe vessels from 2D CT slices were segmented slice-by-slice,
Segmentation
and then isotropic data was generated, as illustrated in Figure 6(e).
Pulmonary fissures are double layers of visceral-pleura and appear as bright lines spanning the
3.1.7. Fissure
parenchyma Detection
on the CT scan and(Figure
Lung Lobe 7a).Segmentation
In order to segment the fissures correctly, prior anatomical
knowledge is crucial. As we discussed
Pulmonary fissures are double layers of in Section 3.1.1, fissures
visceral-pleura and appeardivide the lungs
as bright lines into five separate
spanning the
lobes. A fissure is
parenchyma ontreated
the CT as scana boundary
(Figure 7a).between
In order to thesegment
two lung thelobes;
fissuresthus, the bronchial
correctly, airways and
prior anatomical
vessels cannot overpass
knowledge is crucial. itAstoweenter the adjacent
discussed in Sectionlobe.3.1.1,
Based on this
fissures fact,the
divide thelungs
locationinto of a fissure
five separatecan be
lobes.toAbe
assumed fissure
a gapisbetween
treated asthea boundary
two lungbetween
lobes wherethe two thelung lobes;and
airways thus, the bronchial
vessels are clear.airways
As with andvessel
segmentation, we began fissure segmentation by detecting all of the lesions inside the lungs, as can
vessels cannot overpass it to enter the adjacent lobe. Based on this fact, the location of a fissure depicted
be assumed
in Figure to be a gap between
7b,c. Subsequently, the two lung
we enhanced the lobes
tubular where the airways
lesions and vessels are filtering,
using Hessian-based clear. As withas shown
vessel segmentation, we began fissure segmentation by detecting all of the lesions inside the lungs,
in Figure 7d. Among these tubular lesions, we found the longest continuous and non-branching
as depicted in Figures 7(b) and (c). Subsequently, we enhanced the tubular lesions using Hessian-
lines across the lungs, as represented in Figure 7e. These lines were regarded as the possible fissure
based filtering, as shown in Figure 7(d). Among these tubular lesions, we found the longest
regions, and alland
continuous fissure regions from
non-branching 2Dacross
lines CT slices wereas
the lungs, segmented
represented slice-by-slice.
in Figure 7(e).After
Theseachieving
lines wereall 2D
fissure segments,
regarded as the thepossible
isotropic structure
fissure regions,of and
possible fissure
all fissure lines was
regions fromcreated, as inwere
2D CT slices Figure 7f. From that
segmented
figure, it is evident
slice-by-slice. that
After the isotropic
achieving structure
all 2D fissure can give
segments, theaisotropic
more comprehensive
structure of possibleviewfissure
of fissure
lines lines.
The stacked fissure
was created, as inlines became
Figure plate-like
7(f). From structures
that figure, in the isotropic
it is evident view. However,
that the isotropic structure there
can givewasa some
more comprehensive view of fissure lines. The stacked fissure lines
noise and incomplete parts in the fissure planes. Hence, we performed 3D morphological operations became plate-like structures in
the isotropic view. However, there was some noise and incomplete parts
to remove the noise and enclose the incomplete parts. Finally, we successfully segmented the fissures in the fissure planes. Hence,
we performed
by extracting three 3Dbiggest
morphological
planes, operations
as illustrated to remove the noise
in Figure 7g. To andhighlight
enclose the theincomplete
segmented parts.
fissures,
Finally, we successfully segmented the fissures by extracting three biggest planes, as illustrated in
we also provide transparent views of the isotropic images. Figure 7h shows a transparent view of the
Figure 7(g). To highlight the segmented fissures, we also provide transparent views of the isotropic
oblique and horizontal fissures, dividing the right lung into three lobes. Similarly, Figure 7i shows a
images. Figure 7(h) shows a transparent view of the oblique and horizontal fissures, dividing the
transparent
right lung viewintoofthree
the oblique fissure dividing
lobes. Similarly, Figure 7(i) theshows
left lung into two lobes.
a transparent view ofSubtracting
the oblique these fissurefissure
planes from the segmented lung volumes (Section 3.1.4) gave five
dividing the left lung into two lobes. Subtracting these fissure planes from the segmented lung different lobes, as illustrated in
Figure 7j.
volumes (Section 3.1.4) gave five different lobes, as illustrated in Figure 7(j).

(a) (b) (c) (d) (e)

(f) (g) (h) (i) (j)


Figure
Figure 7. Segmentation
7. Segmentation ofofthe
thepulmonary
pulmonary fissures
fissures and
andlung
lunglobes: (a)(a)
lobes: input 2D CT
input 2D slice, (b) segments
CT slice, (b) segments
of lung regions, (c) segments of lung internal structures, (d) enhanced tube-like structures,
of lung regions, (c) segments of lung internal structures, (d) enhanced tube-like structures, (e) possible
(e) possible
2D fissure lines, (f) isotropic view of possible fissures plane, (g) successfully segmented fissure planes,
2D fissure lines, (f) isotropic view of possible fissures plane, (g) successfully segmented fissure planes,
(h) transparent view of oblique and horizontal fissures in the right lung, (i) transparent view of
(h) transparent view of oblique and horizontal fissures in the right lung, (i) transparent view of oblique
oblique fissure in the left lung, (j) Successfully segmented lung lobes.
fissure in the left lung, (j) Successfully segmented lung lobes.
Appl. Sci. 2019, 9, 2329 11 of 26

Appl.
Appl. Sci.
Sci. 2019,
2019, 9,
9, xx FOR
FOR PEER
PEER REVIEW
REVIEW 11
11 of
of 26
26

3.1.8. Rib and Bone Segmentation


3.1.8.
3.1.8. Rib
Rib and
and Bone
Bone Segmentation
Segmentation
The ribs
The and bones are the highest-density lesions in the
the lungs.They
They appearasaswhite
white on the CT
The ribs
ribs and
and bones
bones are
are the
the highest-density
highest-density lesions
lesions in
in the lungs.
lungs. They appear
appear as white onon the
the CT
CT
scans, as
scans, illustrated in Figure 8a and possess a radiographic density ranging from 400 to 1000 HU.
scans, as
as illustrated
illustrated inin Figure
Figure 8(a)
8(a) and
and possess
possess aa radiographic
radiographic density
density ranging
ranging from
from 400
400 to
to 1000
1000 HU.
HU.
Thus, we can
Thus, easily separate them from other pulmonary lesions byfinding
finding pixelvalues
values higher than
Thus, we
we cancan easily
easily separate
separate themthem from
from other
other pulmonary
pulmonary lesions
lesions by
by finding pixel
pixel values higher
higher than
than
400400
HU. As in the previous steps, an isotropic image of the ribs and bones can be generated using
400 HU.
HU. AsAs inin the
the previous
previous steps,
steps, an
an isotropic
isotropic image
image ofof the
the ribs
ribs and
and bones
bones can
can be
be generated
generated using
using 2D
2D
2Dsegments
segmentsfrom
segments fromeach
from each
each slice.
slice.
slice. ForFor
For illustration,
illustration,
illustration, Figure
Figure
Figure 8(b)8b
8(b) represents
represents
represents thethe
the 2D2D
2D segments
segments
segments andand
and Figure
Figure
Figure 8(c) 8c
8(c)
represents
represents
represents 3D3D3Dsegments
segments
segments ofof
ofthe
theribs
the ribsand
ribs andbones.
and bones.
bones.

(a)
(a) (b)
(b) (c)
(c)
Figure
Figure
Figure Segmentation
8. 8.
8. Segmentationofof
Segmentation ofribs
ribsand
ribs andbones:
and bones: (a)
bones: (a) input 2D CT
input 2D
2D CTslice,
CT slice,(b)
slice, (b)2D
(b) 2Dsegments
2D segmentsof
segments ofof the
the
the ribs
ribs
ribs and
and
and bones,
bones,
bones,
(c) (c)
isotropic
isotropicstructure of
structure the
of the ribs and
ribs andbones.
bones.
(c) isotropic structure of the ribs and bones.

3.1.9. Segmentation ofofNodule


3.1.9. Candidates
3.1.9. Segmentation
Segmentation of NoduleNodule Candidates
Candidates
Nodules
Nodules are soft tissue lesions exhibiting high
high potentialtotobebecancer.
cancer. In Figure 9a,we we can see
Nodules are are soft
soft tissue
tissue lesions
lesions exhibiting
exhibiting high potential
potential to be cancer. In In Figure
Figure 9(a),
9(a), we can
can see
see
an an
example
an example
of of
example of
a nodule
aa nodule (red-bordered)
nodule (red-bordered)
(red-bordered) and
and other
and other
nodule-liked
other nodule-liked
falselesions
nodule-liked false
lesions(yellow-bordered).
false lesions
(yellow-bordered).
(yellow-bordered). For
For a
For aa
clear visualization,
clear a zoomed-in view of Figure 9a is provided in Figure 9b.
clear visualization, a zoomed-in view of Figure 9(a) is provided in Figure 9(b). Nodules can be
visualization, a zoomed-in view of Figure 9(a) is provided in Figure Nodules
9(b). can
Nodules be divided
can be
into three types—solid,
divided
divided into
into three non-solid,non-solid,
three types—solid,
types—solid, and partly
non-solid, solid—depending
and
and partly on theiron
partly solid—depending
solid—depending radiological
on their density.density.
their radiological
radiological The first
density.
type,
Thesolid
The first nodules,
first type,
type, solidisnodules,
solid found most
nodules, is frequently,
is found
found most and has and
most frequently,
frequently, a soft
and tissue
has
has softdensity
aa soft tissue if isolated.
tissue density
density if In theIn
if isolated.
isolated. Incase
the of
the
case
case of vessel attachment, their contours become obscure. The second type, non-solid nodules, is
vessel of vessel
attachment, attachment,
their their
contours contours
become become
obscure. obscure.
The The
second second
type, type,
non-solid non-solid
nodules, nodules,
is inferred
is
by inferred
inferred by
by vessel
vessel density
vessel density
and appears
density and
andas appears as
as aa ground-glass
a ground-glass
appears structure.
structure.
ground-glass The
The last
structure. The last
last type,
type, partly
type, partly
solidsolid
partly nodules,
nodules,
solid nodules, has a
has
has aadensity
hybrid hybrid
hybrid density of
of soft
of soft tissue
density tissue
softand
tissue and
and vessel.
vessel. vessel.

(a)
(a) (b)
(b) (c)
(c)

(d)
(d) (e)
(e) (f)
(f)
Figure 9. Segmentation of
of the nodule candidates: (a)
(a) input 2D
2D CT
Figure
Figure9. 9.
Segmentation
Segmentation the nodule
of the candidates:
nodule candidates: input
(a) input 2Dslice,
CT slice, (b)
(b) zoomed-in
CT slice, zoomed-in view
view of
(b) zoomed-in of
viewthe
the of
parenchymal
parenchymal internal structures,
internalinternal
structures, (c) 2D nodule
(c) 2D nodule candidate
candidate segment, (d)
segment, (d) isotropic structure
isotropic(d)
structure of the
of the segmented
segmented
the parenchymal structures, (c) 2D nodule candidate segment, isotropic structure of the
nodule
nodule candidates,
candidates, (e) isotropic
isotropic structure
(e)candidates, structure of
of blob
blob enhancement
enhancement lesions,
lesions, (f)
(f) isotropic
isotropic structure of
of the true
segmented nodule (e) isotropic structure of blob enhancement lesions,structure
(f) isotropicthe true
structure
tumor
tumor nodule.
nodule.
of the true tumor nodule.
Appl. Sci. 2019, 9, 2329 12 of 26

To ensure the detection of all nodule types, the lesions inside the parenchyma were first segmented
first using the threshold value −800 HU, as illustrated in Figure 9c. Then, an isotropic image of
these internal lesions Figure 9d was reconstructed from 2D segments. Using this isotropic image,
we attempted to extract possible nodule lesions, also known as nodule candidates. A nodule is a round
or oval-shaped structure in a 2D CT scan, but it forms a blob-shaped structure in volumetric view.
Thus, a blob enhancement filter can be applied to extract nodules. In Figure 9e, we can see the isotropic
structures of the segmented nodule candidates. In this step, we cannot directly segment the true tumor
nodules because there are many nodule-like structures in the lungs. These structures interfere with
and make it troublesome to detect the real nodules. Since nodules are the signs of lung cancer, wrong
or missed detection can seriously affect the patient. For this reason, nodule candidates are segmented
first in this step, and then actual nodules are segregated with the help of a machine learning algorithm
in the next detection phase.

3.2. Detection of the True Tumor Nodules


The primary issue of a lung cancer diagnosis is to detect the true tumor nodules. Discriminating
true nodules from other interference lesions remains a challenge, because there is a high number of
false positive lesions inside the lungs. In clinical practice, radiologists generally use several radiological
features such as size, shape, margin and morphology to adequately identify a true tumor nodule.
Based on these radiological criteria, some image features from each nodule candidate are extracted and
fed into a classification algorithm to segregate the real tumor nodules.

3.2.1. Nodule Feature Extraction


Instead of extracting image features from 2D segments, we extracted features from each isotropic
nodule candidate to get more precise measurements. We obtained 28 nodule features in total based on
three major feature groups: geometric, intensity and texture.
The first group, geometric features, contains the most commonly used features for nodule detection.
They can provide information about the physical appearance of tumor nodules. Four geometric
features—volume, perimeter, equivalent diameter and sphericity—were extracted in this study.
The second group, intensity features, is also significant for tumor detection, because the intensity
of the same parenchymal lesions has similar value. These features can be calculated by the first-order
statistics of the histogram, and we extracted five intensity features, namely the average radiological
density (HU), mean, standard deviation, skewness, and kurtosis.
In addition, the last group, texture features, is a great deal of help in identifying the true
nodules. They can provide the inherent characteristics of nodules such as the constituents of air,
fat, cavitation, and so forth [24]. They are the second-order statistical feature and are usually based
on the gray-level co-occurrence matrix (GLCM). Nineteen textural features—autocorrelation, cluster
prominence, cluster shade, contrast, correlation, different entropy, different variance, dissimilarity,
energy, entropy, homogeneity, correlation1, correlation2, inverse difference, maximum probability, sum
average, sum entropy, square variance, and sum variance—were extracted for texture analysis. For
interested readers, detailed descriptions and mathematical calculations of these extracted features can
be found in [25,26].

3.2.2. First Stage Classification


Once the features had been extracted from each isotropic nodule candidate, they were fed into
the classification algorithm to select the true tumor nodules. Our proposed CAD has two stages of
classification; the first stage intents to discriminate the true tumor nodules from other false lesions.
On the other hand, the second stage intents to classify the associated stages of the truly predicted
tumors. For both of the classification stages, we empirically chose the Back Propagation Neural
Network (BPNN) due to its performance. The detailed experiments and a performance comparison
of BPNN with other classifiers will be explained later in Section 4. As the BPNN is a particular type
Appl. Sci. 2019, 9, 2329 13 of 26
Appl. Sci. 2019, 9, x FOR PEER REVIEW 13 of 26

of artificial neural network (ANN), it also works on three fundamental layers (an input layer, one or
of artificial neural network (ANN), it also works on three fundamental layers (an input layer, one or
more hidden layers, and an output layer). Unlike the ANN, BPNN can back propagate the learning
more hidden layers, and an output layer). Unlike the ANN, BPNN can back propagate the learning
to minimize the errors. The twenty-eight extracted nodule features become input neurons, and two
to minimize the errors. The twenty-eight extracted nodule features become input neurons, and two
classes (true-nodules and false-nodules) are outputs of the first stage of BPNN.
classes (true-nodules and false-nodules) are outputs of the first stage of BPNN.

3.3.Staging
3.3. Staging
Thisphase
This phaseisis thethe key
key contribution
contribution of this
of this research,
research, because
because most most
current current
CADsCADs are capable
are capable of tumor of
tumor detection but not staging assessments. Staging measures the severity
detection but not staging assessments. Staging measures the severity of lung cancer by determining of lung cancer by
determining
how large the how
tumors large
andthehowtumors
much and
theyhow
havemuch they
spread. have spread.
Knowing the exactKnowing
stage ofthe exactcan
a tumor stage ofaa
give
tumor can give a great deal of help in providing precise treatments for patients. Hence, it
great deal of help in providing precise treatments for patients. Hence, it plays a vital role in increasing plays a vital
rolepatient’s
the in increasing
survival therate.
patient’s
As wesurvival
describedrate. As we
earlier, described
our earlier,
second-stage our second-stage
classification aims to classification
perform the
staging of true tumor nodules. To carry out this task, we reused the nodule features of trulythe
aims to perform the staging of true tumor nodules. To carry out this task, we reused nodule
predicted
features of truly predicted tumor by the first-stage classification. Furthermore,
tumor by the first-stage classification. Furthermore, we extracted additional locational features to we extracted
additionalthe
determine locational
invasionfeatures to determine
and spread the invasion and spread of the tumors.
of the tumors.

3.3.1.Locational
3.3.1. LocationalFeature
FeatureExtraction
Extraction
Locationalfeatures
Locational featuresare areessential
essentialparameters
parametersfor foridentifying
identifyingthe thestages
stagesof oflung
lungcancer.
cancer.They
Theycan can
provideinformation
provide informationabout aboutthethelocal
localextent
extentororinvasion
invasion ofof
thethe primary
primary tumors.
tumors. AsAs described
described in in Table
Table 1,
1, the
the stages
stages of tumors
of tumors areare highly
highly associated
associated with
with their
their locations.T1-staged
locations. T1-stagedtumors
tumorsare arelocated
locatedinside
inside
thelungs
the lungsand andin inthe
thelobar
lobarbronchus;
bronchus;T2-staged
T2-stagedtumorstumors invade
invade the the main
main bronchus,
bronchus, visceral
visceral pleura,
pleura,
andhilar
and hilarregions;
regions;T3-staged
T3-stagedtumors
tumorsspreadspreadto tothe
thechest
chestwall,
wall, phrenic
phrenic nerve,
nerve, and
and parietal
parietal pericardium;
pericardium;
andfinally,
and finally,T4-staged
T4-stagedtumors
tumorsinvade
invadethe thediaphragm,
diaphragm,mediastinum,
mediastinum,heart, heart, great
great vessels,
vessels, and
and trachea.
trachea.
Consideringthese
Considering theseclinical
clinicaldescriptions,
descriptions,we weextracted
extractedthe thelocational
locationalfeatures
featuresof ofthe
thetumor
tumornodules
nodulesby by
dividingthe
dividing thelungs
lungsinto
intodifferent
differentlocational
locationalzones.
zones.The Thesegmentation
segmentationof oflung
lunganatomical
anatomicalstructures
structuresin in
Section3.1
Section 3.1gives
givesaagreat
greatdeal
dealofofhelp
helpforfordiving
divingthe thelocational
locationalzones.
zones.
Firstly,we
Firstly, wedivided
dividedthe theisotropic
isotropicstructure
structureofofthe thetrachea
tracheaand andbronchial
bronchialairway
airwaytreetree(Section
(section3.1.5)
3.1.5)
intothree
into threedifferent
different zones.
zones. TheThe first
first zone
zone (Zone
(Zone 1) represents
1) represents the regions
the regions of theoftrachea;
the trachea; the second
the second zone
zone (Zone
(Zone 2) represents
2) represents the regions
the regions of theofmain
the main
bronchusbronchusand and
hilarhilar regions,
regions, andandthe the
thirdthird
zone zone (Zone
(Zone 3)
3) represents the region of the lobar bronchus. To determine these three
represents the region of the lobar bronchus. To determine these three zones, the 3D skeletonization zones, the 3D skeletonization
method[27]
method [27]was
wasapplied.
applied.ThisThismethod
methoditeratively
iteratively erodes
erodesthe theimage
imagevoxels
voxelsuntil
untilonly
onlyaasingle-voxel
single-voxel
centerlineisisachieved.
centerline achieved. Figure
Figure 10 10 depicts
depicts anan isotropic
isotropic image
image of of the
the trachea
trachea and
and bronchial
bronchial airway
airway tree.
tree.
Inthe
In thefigure,
figure,thethered
redthin-centerline
thin-centerlinerepresents
representsthe theskeleton
skeletonofofthe theisotropic
isotropicimage.
image.

Figure10.
Figure 10. Different
Differentzones
zonesof
ofthe
thetrachea
tracheaand
andbronchial
bronchialairway.
airway.

From
Fromthis
thiscenterline,
centerline,wewetried to to
tried detect thethe
detect branching
branchingpoints, because
points, the trachea
because branches
the trachea into two
branches into
main bronchi
two main at a point
bronchi at acalled
pointthe carina.
called theThe first The
carina. branching point of the
first branching centerline
point of thecan be designated
centerline can be
as the place of
designated as the
the carina.
place ofBased on thisBased
the carina. point,onwethis
determined
point, wethe upper portion
determined of theportion
the upper carina as
ofthe
the
region
carinaof
asthe
thetrachea
region (Zone
of the 1). Similarly,
trachea (Zonethe1).regions between
Similarly, the carina
the regions pointthe
between andcarina
the next branching
point and the
Appl. Sci. 2019, 9, x FOR PEER REVIEW 14 of 26

next branching points were defined as Zone 2—the areas of the main bronchus and hilar. Then, the
other portion
Appl. Sci. 2019, 9,was
2329assigned as the Zone 3—the lobar bronchus. 14 of 26
Once these three zones had been defined, the next location was the region of the chest wall (Zone
4). The chest wall is an organ that surrounds and protects the lungs. It is comprised of bones especially
points
the ribs,were defined
sternum, andas spine.
Zone 2—the
This zoneareaswas
of the main
easily bronchus
defined and hilar.
because Then,
the ribs andthe otherhad
bones portion was
already
assigned as the Zone 3—the lobar bronchus.
been segmented, as described in Section 3.1.8. Another location was the pulmonary vessels (Zone 5).
As in Once
zone these
4, thisthree
zonezones had been
was easily defined,
defined usingthethe
next location
vessel was thedescribed
segments region of in
thesection
chest wall (Zone
3.1.6. Next, 4).
The chest wall is an organ that surrounds and protects the lungs. It is comprised
zone 6 included the regions of the mediastinum, heart, phrenic nerve, and parietal pericardium. We of bones especially the
ribs, sternum,
classified theseand spine.
organs This zone
together intowasZoneeasily defined
6 because because
they the ribsinside
are located and bones had already been
the mediastinum, the
segmented, as described in Section 3.1.8. Another location was the pulmonary
region between the two lungs. The right and left lungs segments can help with the identification of vessels (Zone 5). As in
this zone, and the area between the two lungs can be regarded as Zone 6. Finally, Zone 7 was defined6
zone 4, this zone was easily defined using the vessel segments described in Section 3.1.6. Next, zone
included
as the regions
the location of the mediastinum,
of the diaphragm, which isheart, phrenic
the portion nerve,
below and
the parietal
lungs. pericardium.
To identify zonesWe classified
6 and 7, the
these organs together into Zone 6 because they are located inside the mediastinum,
segments of the lung regions described in section 3.1.4 were applied. For a clear understanding, the the region between
the two lungs.
locational zonesTheandright and left tumor
associated lungs segments
stages are cansummarized
help with the inidentification of this zone,
Table 2. Moreover, and the
a graphical
area between the
representation two lungs
of these zonescan be regarded
is also presentedasinZone 6. Finally,
Figure 11. Zone 7 was defined as the location of
the diaphragm, which is the portion below the lungs. To identify zones 6 and 7, the segments of the
lung regions described in Section Table 3.1.4 were applied.
2. Dividing For a clear
different zones.understanding, the locational zones
and associated tumor stages are summarized in Table 2. Moreover, a graphical representation of these
Lung Associated tumor
zones is also presented in Figure 11. Regions Invaded
Zones stages
Zone 1 TableTrachea
2. Dividing different zones. T4
Zone 2 Main bronchus and hilar T2
Lung 3Zones
Zone Regions
Lobar Invaded
bronchus Associated Tumor
T1 Stages
Zone
Zone 4 1 Trachea
Chest wall T4
T3
Zone 2 Main bronchus and hilar T2
Zone 5 Pulmonary vessels T4
Zone 3 Lobar bronchus T1
Zone Mediastinum, heart, Chest phrenicwallnerve, and parietal
Zone 6 4 T3 T4
T3,
Zone 5 pericardium
Pulmonary vessels T4
ZoneZone 7 6 Mediastinum, heart, phrenic nerve, and parietal pericardium
Diaphragm T3,T4
T4
Zone 7 Diaphragm T4

Figure 11. Different locational zones of the lungs.


Figure 11. Different locational zones of the lungs.
The idea behind dividing the lungs into different locational zones is to check the invasion of the
tumorThenodules. This dividing
idea behind task can be
theaccomplished by calculating
lungs into different locationalthezones
3D convex hull the
is to check andinvasion
comparing the
of the
coordinate
tumor values.
nodules. ThisThe convex
task can behull of a region can
accomplished by be calculatedthe
calculating by3D
finding
convexthe hull
smallest
and convex
comparingpolygon
the
which encloses
coordinate all of
values. Thetheconvex
points of that
hull ofregion. In this
a region can study, we applied
be calculated the Quick
by finding thehull algorithm
smallest [28],
convex
because which
polygon it can handle
encloses high
all dimension
of the pointsproblems. As we In
of that region. were
thisworking
study, weon applied
the 3D volumetric
the Quick data,
hull
the algorithm
algorithm becauseaitp can
[28],returned − by handle
− 3 matrix,
highwhere p is theproblems.
dimension total number As of
wevertex
were points
workingof the
on polygon
the 3D
and showsdata,
volumetric each the
point with three
algorithm returned a 𝑝 −values
coordinate 𝑏𝑦 − 3(x, y, z). where
matrix, Assuming𝑝 is Tthe
is total
a tumor nodule,
number Lz is a
of vertex
specific locational zone (any of zone from Zone 1 to Zone 7 is possible). We checked the invasion of T
to the Lz using the invasion checking Algorithm 1 described below.
Appl. Sci. 2019, 9, 2329 15 of 26

Algorithm 1. Invasion checking

1. Calculate the convex hulls of T and Lz and name them CT and CLz respectively. Both of them are the
p-by-3 matrixes, where p is the number of vertices in the convex polygon, and each vertex has x, y and z
coordinate values.
2. Assign the initial status of invasion with a Boolean value such that Invasion = 0.
3. From CT , find six values which are the minimum and maximum of each coordinate. Let us say that
minCTx and maxCTx are the minimum and maximum values among the x coordinate points. In this way,
minCTy , maxCTy , minCTz , and maxCTz are also calculated from the y and z coordinates points.
4. Similar to step 3, calculate another six values from CLZ . They are minCLZx , maxCLZx , minCLZy , minCLZy ,
minCLZy and minCLZy .
5. Based on these values, check whether T is located inside or outside of LZ. If minCTx > minCLZx and
minTy > minCLZy and minCTz > minCLZz and maxCTx < maxCLZx and maxTy < maxCLZy and
maxCTz < maxCLZz , then the convex polygon of T geometrically locates inside that of Lz and is assigned
the value Invasion = 1.

Generally, tumor nodules are isolated inside the lungs, but in some cases they may attach to the
visceral pleura. In order to investigate this, we first checked whether the tumor nodule was inside or
outside of the lung regions. For this task, we used the previously described checking algorithm again.
Here, Lz became the lung regions instead of the locational zones. Once the tumor nodule was located
inside the lungs, we needed to check whether the nodule was attached to the pleura or not. For this
task, the following steps of the pleural attachment checking Algorithm 2 were performed.

Algorithm 2. Pleural attachment checking

1. L is one of the lungs (left or right lung), and T is the tumor nodule. Both L and T are three dimensional
arrays (512 × 512 × number of slices in a single exam). Firstly, convert both arrays into logical arrays by
changing all the voxels values into 1 and 0.
2. Assign a Boolean value to the initial status of pleural attachment such that Pleural attachment = 0.
3. Find the edge voxels of the logical array L, and define it as E.
4. Calculate the logical negation of E from T and assign the results into a temporary array called Tmp such
that Tmp = T (∼ E).
5. Check whether Tmp has any values of 1 or not. If there is a value of 1, some part of the tumor is along the
edge of the lung region. Modify the status of pleural attachment to Pleural attachment = 1.

After checking for pleural attachment, another essential feature is the number of tumor nodules
in the lungs. This feature can be achieved by counting the total number of true predicted nodules in
a single exam. If a patient has more than one nodule, the locations of the satellite nodules must be
considered for tumor staging. If the tumors are in the same lobe, this indicates the T3 stage, whereas if
they are in different lobes, this indicates the T4 stage. In order to know whether the tumors are in the
same lobe or different lobes, we needed to define the invasion of tumor nodules into a specific lobe.
Thus, we used the invasion checking algorithm again, but Lz became a particular lobe of lungs. Based
on the invasion results, we made five flags for lobe invasion: rightupperlobe = 1, rightmiddlelobe = 2,
rightlowerlobe = 3, le f upperlobe = 4 and le f lowerlobe = 5. As a summary, the locational features
extracted in this section and their return values are described in Table 3.
Appl. Sci. 2019, 9, 2329 16 of 26

Table 3. Locational features.

Features Description Returns


Determines the invasion of tumor nodules into different zones
Invasion 1, 2, 3, 4, 5, 6, 7
and returns zone numbers.
Determines whether the tumor nodules attach the visceral
Pleural attachment pleura or not and returns Boolean value (0 = not pleural 0,1
attachment and 1 = pleural attachment).
Calculates the number of true predicted tumor nodules in a
Count 1, . . . , n
single exam and returns an integer value n.
Determines which lobe the tumor nodules exist in and return
Lobe location 1, 2, 3, 4, 5
the lobe numbers.

3.3.2. Second-Stage Classification


The second-stage classification aims to classify the true tumor nodules into different T-stages.
As in the first stage, BPNN is applied in this stage. However, the first stage classification is a binary
classification problem that only predicts two classes (true tumor nodules and false tumor nodules).
Unlike the first stage classification, the second stage classification is a multi-classification problem
that predicts seven clinical stages of tumors, namely T0/Tis, T1a, T1b, T2a, T2b, T3 and T4. Stage Tx
is neglected in the classification because tumor nodules cannot be detected on the CT scans at that
stage. In the first-stage classification, we used twenty-eight nodule features to segregate the true tumor
nodules. After detecting a true tumor nodule, the associated nodule features of that nodule were
carried to the staging phase, as depicted in Figure 1. Then, four locational features (described in Table 3)
were added to determine the tumor stages in second-stage classification. Therefore, the total number of
features fed into the second-stage BPNN was thirty-two, and the number of output classes was seven.

4. Experimental Results
In this study, we propose a CAD that is capable of detecting and staging of the pulmonary
tumors for lung cancer diagnosis. This CAD was developed and evaluated using four datasets,
namely LIDC-IDRI [29,30], NSCLC-Radiomics-Genomics [2,31], NSCLC-Radiomics [2,32], and NSCLC
Radiogenomics [33,34] (Supplementary Materials) where NSCLC stands for non-small cell lung cancer.
These datasets are downloadable from The Cancer Imaging Archive (TCIA) [35] which is an open
source of medical images for scientific and educational research. Expert radiologists confirmed all of
the tumor nodules in these datasets, and the clinical diagnosis results were also provided. We applied
these results as the gold standard to evaluate the performance of proposed CAD. Table 4 presents
detailed descriptions of each dataset used in this study.
As described earlier, our CAD has two primary purposes: detection and staging. The first purpose
is more fundamental for the production of accurate diagnosis results because the correctly predicted
tumor nodules from the first stage are further applied for staging. Among the datasets mentioned
above, the LIDC-IDRI only provides XML files for the true nodule annotations. The clinical stages of
the tumor nodules are not available in LIDC-IDRI. Therefore, we applied it only for the detection stage.
Moreover, it is a huge dataset and had been employed by several researchers to develop CADs for
nodule detection. There are 1010 exams in the LIDC-IDRI dataset, but only 888 exams were applied
for our experiment because some exams contain missing and inconsistent slices. The application of
LIDC-IDRI for the first purpose, detection, also acted as a separate validation for more robust detection.
Unlike LIDC-IDRI, the three NSCLC datasets can provide the data for clinical stages of tumor. Thus,
we applied these three datasets for the second purpose. Indeed, the CT exams from these three datasets
had to pass through the first stage classification to segregate the true tumor nodules. Subsequently,
the true predicted tumors were categorized into different clinical stages by the second-stage classifier.
Therefore, the first stage of classification was double validated by not only LIDC-IDRI, but also NSCLC.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 17 of 26

datasets had to pass through the first stage classification to segregate the true tumor nodules.
Subsequently, the true predicted tumors were categorized into different clinical stages by the second-
Appl. Sci. 2019, 9, 2329 17 of 26
stage classifier. Therefore, the first stage of classification was double validated by not only LIDC-
IDRI, but also NSCLC.
Table 4. Detailed descriptions of the datasets applied.
Table 4. Detailed descriptions of the datasets applied.
Dataset Descriptions Number of Exams Used
Contains 1010Descriptions
thoracic CT scans of lung
Number of
Dataset
cancer patients and an associated XML exams used
LIDC-IDRI 888
file that
Contains 1010 shows CT
thoracic the true nodule
scans of lung cancer patients
annotations.
LIDC-IDRI and an associated XML file that shows the true nodule
Contains 89 CT scans from non-small
888
annotations.cell lung cancer patients who
NSCLC-Radiomics-Genomics 89
Contains 89underwent
CT scans fromThe
surgery. non-small cell
clinical data of lung cancer
NSCLC-Radiomics- tumor stages is alsosurgery.
available. The clinical data of
patients who underwent 89
Genomics Contains 422 CT scans from non-small
tumor stages is also available.
cell lung cancer patients.
NSCLC-Radiomics Contains 422 CT scans from
Three-dimensional non-small
manual cell lung cancer422
delineation
NSCLC-Radiomics and clinical results by radiologist
patients. Three-dimensional manual are delineation and 422
available.
clinical results by radiologist are
Contains 211 CT, PET/CT scans of
available.
Contains 211 CT, PET/CT
non-small cell lung scans of non-small cell lung
cancer patients.
NSCLC
NSCLC Radiogenomics
Both annotation files and files
clinical data
211
cancer patients. Both annotation and clinical data are 211
Radiogenomics are also provided.
also provided.

However,the
However, thesecond
secondstage
stageof ofclassification
classificationisisaamulti-classification
multi-classificationproblem
problemthat thatoutputs
outputsseven
seven
classes. Class 1 represents T1a, class 2 represents T1b, class 3 represents T2a, class
classes. Class 1 represents T1a, class 2 represents T1b, class 3 represents T2a, class 4 represents T2b,4 represents T2b,
class55represents
class representsT3,T3,class
class66represents
representsT5, T5,and
andclass
class77represents
representsT0/Tis.
T0/Tis.The
TheNSCLC
NSCLCdatasets
datasetscontain
contain
a different number of samples for each class, and the class distribution is profoundly
a different number of samples for each class, and the class distribution is profoundly different, as different, as
describedin
described in Figure
Figure 12. In this
this figure,
figure,ititisisevident
evidentthat there
that were
there werefewfew
samples
samplesfor for
class 6 (T5)
class and and
6 (T5) class
7 (T0/Tis).
class AfterAfter
7 (T0/Tis). removing the missing-class
removing the missing-classsamples, the total
samples, thenumber of samples
total number from three
of samples fromNSCLC
three
datasetsdatasets
NSCLC becamebecame
672 and672theyandwere used
they were forused
staging.
for staging.

160
Class Distributions
140
120
100
80
60 Class-1
40 Class-2
20
Class-3
0
NSCLC Radiomics NSCLC Radiomics-genomics NSCLC- Radiogenomics Class-4
Class-1 93 21 40
Class-5
Class-2 156 2 31
Class-6
Class-3 53 39 47
Class-4 117 0 10 Class-7
Class-5 2 22 21
Class-6 0 3 7
Class-7 0 2 6

Figure
Figure12.
12.Class
Classdistribution
distributionofofNSCLC
NSCLCdatasets.
datasets.

Beforewe
Before westarted
startedthe
theexperiment,
experiment,we weperformed
performedthe thepreparation
preparationofofthe thedata
dataforfortraining
trainingand
and
testing. As the data in the real world is fuzzy in nature, we validated our CAD using
testing. As the data in the real world is fuzzy in nature, we validated our CAD using repeated random repeated random
trials.Figure
trials. Figure 13
13 shows the
the preparation
preparationofofdata
dataforfor
repeated
repeatedrandom
random trials in both
trials firstfirst
in both and and
second stage
second
classifications.
stage For first
classifications. For stage classification,
first stage the samples
classification, in theinLIDC-IDRI
the samples the LIDC-IDRIdataset (888 (888
dataset samples) were
samples)
randomly
were divided
randomly into five
divided intosplits
five 50%-50%, 60%-40%,
splits 50%-50%, 70%-30%,
60%-40%, 80%-20%
70%-30%, and 90%-10%
80%-20% for training
and 90%-10% for
and testing.
training Similarly,
and testing. for the second
Similarly, for the stage
secondclassification, samplessamples
stage classification, in the NSCLC datasetsdatasets
in the NSCLC (672 samples)
(672
were randomly divided into five trials. In each split, we repeatedly selected random samples k = 100
times as described in the figure. Since the samples were randomly selected, each iteration k generated
Appl. Sci. 2019, 9, x FOR PEER REVIEW 18 of 26
Appl. Sci. 2019, 9, 2329 18 of 26
samples) were randomly divided into five trials. In each split, we repeatedly selected random
samples 𝑘 = 100 times as described in the figure. Since the samples were randomly selected, each
different
iteration 𝑘 samples
generatedfordifferent
trainingsamples
and testing. We trained
for training and tested
and testing. Wefive different
trained classifiers
and tested namely the
five different
Decision Tree (DT), K-nearest neighbor (KNN), Support Vector Machine (SVM),
classifiers namely the Decision Tree (DT), K-nearest neighbor (KNN), Support Vector Machine Ensemble Tree (ET),
and the
(SVM), Back Propagation
Ensemble Neural
Tree (ET), and theNetwork (BPNN) on
Back Propagation these Network
Neural random samples.
(BPNN) In oneach
theseiteration
randomwe
calculated the performance measures of each classifier then calculated the mean and standard
samples. In each iteration we calculated the performance measures of each classifier then calculated deviation
of performance measures for each split.
the mean and standard deviation of performance measures for each split.

Figure
Figure 13. 13.
DataData preparation
preparation for for repeated
repeated random
random trials.
trials.

TheTheperformance
performance measurements
measurements which
whichwere
werecalculated for each
calculated classifier
for each are accuracy,
classifier sensitivity,
are accuracy,
specificity, precision, and the F-score. They can be computed by:
sensitivity, specificity, precision, and the F-score. They can be computed by:
𝑇𝑃 + 𝑇𝑁 TP + TN
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (6)(6)
𝑇𝑃 +=𝑇𝑁TP
accuracy + 𝐹𝑃 + 𝐹𝑁
+ TN + FP + FN
𝑇𝑃 TP
𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦(𝑟𝑒𝑐𝑎𝑙𝑙)
sensitivity(=recall )= (7)(7)
𝑇𝑃 + 𝐹𝑁 TP + FN
TN
𝑇𝑁=
speci f icity (8)
𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = FP + TN (8)
𝐹𝑃 + 𝑇𝑁
TP
precision = (9)
𝑇𝑃 FP + TP
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (9)
𝐹𝑃 + 𝑇𝑃Precision ∗ recall
F1 − score = 2 ∗ (10)
∗ 𝑟𝑒𝑐𝑎𝑙𝑙 + recall
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛Precision
𝐹1 − 𝑠𝑐𝑜𝑟𝑒 = 2 ∗ (10)
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 +is𝑟𝑒𝑐𝑎𝑙𝑙
Nonetheless, the second stage of classification a multi-classification problem unlike the first;
it is therefore more
Nonetheless, complex
the second to evaluate
stage the performance.
of classification To solve this problem,
is a multi-classification we used
problem unlike thethe “one
first;
versus the rest method”, which calculates the performance of the classifier
it is therefore more complex to evaluate the performance. To solve this problem, we used the “one for an individual class and
the overall performance. For example, C ( i = 1, 2, . . . , l )
versus the rest method”, which calculates the performance ofi the classifier for an individual class and a
then calculates is an individual class of
thenmulti-classification
calculates the overall problem, and l is the total
performance. number of𝐶classes.
For example, (𝑖 = 1,2, From the
… , 𝑙) is counts of Ci, theclass
an individual assessments
of a
TP i , TN i ,
multi-classificationFP i and FN i were
problem, obtained.
and 𝑙 is the total number of classes. From the counts of classes
Then, the overall performance of the classifier for all 𝐶𝑖 , thewas
calculated 𝑇𝑃
assessments by:, 𝑇𝑁 , 𝐹𝑃 and 𝐹𝑁 were obtained. Then, the overall performance of the classifier
Pl TPi +TNi
for all classes was calculated by: i=1 TPi +TNi +FPi +FNi
accuracy = (11)
𝑇𝑃 + 𝑇𝑁 l

𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 Pl + 𝐹𝑁TPi (11)
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑙 i=1 TPi +FNi
sensitivity = (12)
𝑇𝑃 l

𝑇𝑃 +P 𝐹𝑁 (12)
𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = l TNi
𝑙 i=1 TNi +FPi
speci f icity = (13)
𝑇𝑁 l

𝑇𝑁 + 𝐹𝑃 (13)
𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
𝑙
Appl. Sci. 2019, 9, 2329 19 of 26

Pl TPi
i=1 TPi +FPi
precision = (14)
l
Precisioni ∗recalli
Pl  
i=1 2∗ Precisioni +recalli
F1 − score = (15)
l
In addition to these measurements, we also take into account the area under the curve (AUC) and
execution time of each classifier for performance evaluation. Table 5 shows the mean and standard
deviation of performance measurements obtained from each classifier in the first stage classification.

Table 5. Comparison of performance measurements generated by five different classifiers for


first-stage classification.

Mean(±Standard Deviation) of Performance Assessments for 100 Random Iterations


Classifiers Measurements 50%-50% 60%-40% 70%-30% 80%-20% 90%-10%
Accuracy 0.82(±0.02) 0.82(±0.02) 0.83(±0.02) 0.83(±0.03) 0.83(±0.03)
Sensitivity 0.79(±0.03) 0.79(±0.03) 0.80(±0.03) 0.81(±0.05) 0.81(±0.06)
Specificity 0.86(±0.02) 0.86(±0.03) 0.86(±0.03) 0.86(±0.04) 0.86(±0.05)
DT Precision 0.85(±0.02) 0.85(±0.03) 0.85(±0.03) 0.85(±0.04) 0.85(±0.04)
F1-score 0.82(±0.02) 0.82(±0.02) 0.82(±0.02) 0.83(±0.03) 0.83(±0.04)
AUC 0.92(±0.01) 0.92(±0.01) 0.93(±0.01) 0.93(±0.02) 0.93(±0.02)
Time 0.10(±0.07) 0.10(±0.04) 0.13(±0.06) 0.14(±0.06) 0.15(±0.04)
Accuracy 0.87(±0.01) 0.87(±0.02) 0.87(±0.02) 0.87(±0.02) 0.88(±0.03)
Sensitivity 0.85(±0.03) 0.85(±0.03) 0.86(±0.03) 0.85(±0.04) 0.85(±0.05)
Specificity 0.89(±0.03) 0.89(±0.03) 0.89(±0.03) 0.89(±0.04) 0.90(±0.04)
KNN Precision 0.89(±0.02) 0.89(±0.03) 0.88(±0.03) 0.89(±0.03) 0.90(±0.04)
F1-score 0.87(±0.02) 0.87(±0.02) 0.87(±0.02) 0.87(±0.03) 0.87(±0.03)
AUC 0.95(±0.01) 0.95(±0.01) 0.95(±0.01) 0.95(±0.01) 0.95(±0.02)
Time 0.05(±0.14) 0.05(±0.12) 0.05(±0.13) 0.06(±0.16) 0.05(±0.12)
Accuracy 0.84(±0.02) 0.84(±0.02) 0.85(±0.02) 0.85(±0.02) 0.85(±0.03)
Sensitivity 0.79(±0.03) 0.79(±0.03) 0.80(±0.03) 0.80(±0.04) 0.80(±0.06)
Specificity 0.89(±0.02) 0.90(±0.02) 0.90(±0.02) 0.91(±0.03) 0.91(±0.04)
SVM Precision 0.88(±0.02) 0.89(±0.02) 0.89(±0.02) 0.90(±0.03) 0.90(±0.04)
F1-score 0.84(±0.02) 0.84(±0.02) 0.84(±0.02) 0.84(±0.03) 0.85(±0.04)
AUC 0.91(±0.01) 0.91(±0.02) 0.92(±0.01) 0.93(±0.02) 0.93(±0.02)
Time 0.06(±0.07) 0.06(±0.04) 0.08(±0.07) 0.09(±0.06) 0.11(±0.04)
Accuracy 0.86(±0.02) 0.86(±0.02) 0.86(±0.02) 0.86(±0.03) 0.86(±0.03)
Sensitivity 0.81(±0.03) 0.81(±0.03) 0.81(±0.03) 0.81(±0.04) 0.81(±0.06)
Specificity 0.90(±0.02) 0.90(±0.02) 0.90(±0.03) 0.91(±0.03) 0.91(±0.04)
ET Precision 0.89(±0.02) 0.90(±0.02) 0.89(±0.03) 0.90(±0.04) 0.90(±0.04)
F1-score 0.85(±0.02) 0.85(±0.02) 0.85(±0.02) 0.85(±0.03) 0.85(±0.04)
AUC 0.93(±0.01) 0.93(±0.02) 0.93(±0.01) 0.93(±0.02) 0.93(±0.02)
Time 0.48(±0.17) 0.45(±0.16) 0.51(±0.15) 0.53(±0.15) 0.54(±0.11)
Accuracy 0.92(±0.01) 0.93(±0.02) 0.93(±0.02) 0.93(±0.02) 0.93(±0.02)
Sensitivity 0.93(±0.02) 0.93(±0.03) 0.94(±0.03) 0.94(±0.03) 0.94(±0.04)
Specificity 0.91(±0.03) 0.92(±0.03) 0.92(±0.03) 0.92(±0.03) 0.92(±0.04)
BPNN Precision 0.91(±0.02) 0.92(±0.03) 0.92(±0.03) 0.92(±0.03) 0.92(±0.04)
F1-score 0.92(±0.02) 0.93(±0.02) 0.93(±0.02) 0.93(±0.02) 0.93(±0.03)
AUC 0.96(±0.02) 0.97(±0.02) 0.97(±0.02) 0.97(±0.01) 0.97(±0.02)
Time 0.58(±0.17) 0.57(±0.22) 0.67(±0.18) 0.72(±0.15) 0.76(±0.17)

Similarly, Table 6 describes the comparison of performance measurements generated by the five
different classifiers for the second stage classification.
Appl. Sci. 2019, 9, 2329 20 of 26

Table 6. Comparison of performance measurements generated by five different classifiers for


second-stage classification.

Mean(±Standard Deviation) of Performance Assessments for 100 Random Iterations


Classifiers Measurements 50%-50% 60%-40% 70%-30% 80%-20% 90%-10%
Accuracy 0.79(±0.03) 0.76(±0.03) 0.75(±0.03) 0.74(±0.03) 0.73(±0.03)
Sensitivity 0.62(±0.08) 0.58(±0.07) 0.58(±0.06) 0.58(±0.06) 0.56(±0.05)
Specificity 0.96(±0.01) 0.95(±0.01) 0.95(±0.01) 0.95(±0.00) 0.95(±0.01)
DT Precision 0.85(±0.03) 0.85(±0.03) 0.84(±0.04) 0.83(±0.03) 0.83(±0.03)
F1-score 0.63(±0.08) 0.59(±0.07) 0.59(±0.06) 0.59(±0.07) 0.57(±0.06)
AUC 0.71(±0.05) 0.69(±0.07) 0.68(±0.04) 0.68(±0.04) 0.67(±0.04)
Time 0.05(±0.16) 0.05(±0.14) 0.05(±0.14) 0.06(±0.13) 0.06(±0.13)
Accuracy 0.55(±0.04) 0.57(±0.03) 0.59(±0.04) 0.61(±0.04) 0.62(±0.03)
Sensitivity 0.58(±0.05) 0.58(±0.04) 0.50(±0.05) 0.60(±0.05) 0.53(±0.05)
Specificity 0.91(±0.01) 0.92(±0.01) 0.92(±0.01) 0.93(±0.01) 0.93(±0.01)
KNN Precision 0.67(±0.08) 0.71(±0.04) 0.70(±0.07) 0.72(±0.07) 0.73(±0.06)
F1-score 0.52(±0.15) 0.54(±0.12) 0.55(±0.15) 0.54(±0.17) 0.58(±0.14)
AUC 0.48(±0.05) 0.50(±0.04) 0.51(±0.05) 0.52(±0.05) 0.53(±0.04)
Time 0.04(±0.04) 0.04(±0.04) 0.04(±0.04) 0.06(±0.04) 0.06(±0.05)
Accuracy 0.46(±0.14) 0.45(±0.12) 0.50(±0.12) 0.52(±0.13) 0.52(±0.11)
Sensitivity 0.62(±0.11) 0.60(±0.11) 0.66(±0.09) 0.67(±0.11) 0.67(±0.10)
Specificity 0.91(±0.02) 0.91(±0.02) 0.92(±0.02) 0.92(±0.02) 0.92(±0.02)
SVM Precision 0.48(±0.12) 0.47(±0.10) 0.50(±0.09) 0.52(±0.10) 0.50(±0.09)
F1-score 0.66(±0.16) 0.69(±0.15) 0.65(±0.13) 0.62(±0.14) 0.66(±0.13)
AUC 0.54(±0.12) 0.53(±0.11) 0.57(±0.09) 0.58(±0.10) 0.57(±0.09)
Time 0.36(±0.08) 0.37(±0.07) 0.35(±0.07) 0.45(±0.13) 0.41(±0.15)
Accuracy 0.94(±0.03) 0.91(±0.03) 0.89(±0.03) 0.88(±0.03) 0.87(±0.03)
Sensitivity 0.56(±0.32) 0.63(±0.28) 0.73(±0.18) 0.73(±0.21) 0.79(±0.07)
Specificity 0.98(±0.01) 0.98(±0.01) 0.98(±0.01) 0.98(±0.01) 0.98(±0.01)
ET Precision 0.92(±0.06) 0.93(±0.05) 0.94(±0.03) 0.94(±0.04) 0.95(±0.01)
F1-score 0.57(±0.33) 0.65(±0.28) 0.75(±0.19) 0.75(±0.22) 0.81(±0.08)
AUC 0.63(±0.36) 0.71(±0.30) 0.80(±0.19) 0.79(±0.22) 0.86(±0.05)
Time 0.29(±0.11) 0.29(±0.10) 0.29(±0.10) 0.37(±0.13) 0.33(±0.14)
Accuracy 0.90(±0.05) 0.90(±0.03) 0.91(±0.02) 0.91(±0.02) 0.91(±0.02)
Sensitivity 0.84(±0.09) 0.79(±0.08) 0.78(±0.08) 0.74(±0.08) 0.72(±0.07)
Specificity 0.98(±0.00) 0.97(±0.01) 0.98(±0.01) 0.97(±0.00) 0.97(±0.01)
BPNN Precision 0.96(±0.02) 0.95(±0.02) 0.93(±0.02) 0.93(±0.02) 0.92(±0.02)
F1-score 0.85(±0.08) 0.81(±0.08) 0.80(±0.08) 0.77(±0.08) 0.75(±0.07)
AUC 0.89(±0.05) 0.86(±0.05) 0.85(±0.05) 0.82(±0.05) 0.81(±0.04)
Time 0.48(±0.11) 0.48(±0.08) 0.48(±0.09) 0.62(±0.16) 0.59(±0.20)

For graphical representation, box-plot distributions were applied to compare the performance of
five alternative classifiers. Figure 14 illustrates the box-plots comparison between mean performance
measurements generated by each classifier in first stage classification. Similarly, Figure 15 shows
the box-plots comparison for second stage classification. Analyzing the distributions of box-plots,
we can know that BPNN offers significantly higher performance than other classifiers in first stage
classification. However, for the second stage classification, the performance of classifiers declined
compared to the first stage because of class imbalance in the samples. The performance of DT, KNN and
SVM decreased, and it indicates that they are not robust enough for random data with class imbalance.
On the other hand, the ET and BPNN are robust for random data. The performance of ET seemed
better than BPNN especially in specificity and precision because its box-plots had upper middle lines
(higher median values) and small range of whiskers (condense data). But in other assessments such as
accuracy, sensitivity, F1-score and AUC, BPNN showed higher values. Therefore, we decided to use
BPNN for both stages of classification. As a result, BPNN reached an accuracy of 92.8%, a sensitivity
of 93.6%, a specificity of 91.8%, a precision of 91.8%, an F1-score of 92.8% and an AUC of 96.8% for
detection. For staging, it achieved an accuracy of 90.6%, a sensitivity of 77.4%, a specificity of 97.4%, a
of ET seemed better than BPNN especially in specificity and precision because its box-plots had upper
middle lines (higher median values) and small range of whiskers (condense data). But in other
middle lines (higher median values) and small range of whiskers (condense data). But in other
assessments such as accuracy, sensitivity, F1-score and AUC, BPNN showed higher values.
assessments such as accuracy, sensitivity, F1-score and AUC, BPNN showed higher values.
Therefore, we decided to use BPNN for both stages of classification. As a result, BPNN reached an
Therefore, we decided to use BPNN for both stages of classification. As a result, BPNN reached an
accuracy of 92.8%, a sensitivity of 93.6%, a specificity of 91.8%, a precision of 91.8%, an F1-score of
accuracy of 92.8%, a sensitivity of 93.6%, a specificity of 91.8%, a precision of 91.8%, an F1-score of
92.8%
Appl. Sci.and
2019,an AUC of 96.8% for detection. For staging, it achieved an accuracy of 90.6%, a sensitivity
9, 2329 21 of 26
92.8% and an AUC of 96.8% for detection. For staging, it achieved an accuracy of 90.6%, a sensitivity
of 77.4%, a specificity of 97.4%, a precision of 93.8%, an F1-score of 79.6% and an AUC of 84.6%
of 77.4%, a specificity of 97.4%, a precision of 93.8%, an F1-score of 79.6% and an AUC of 84.6%
respectively. Even though BPNN took more execution time, it can guarantee random and imbalanced
respectively.
precision Even an
of 93.8%, though BPNN
F1-score took more
of 79.6% execution
and an AUC oftime,
84.6%it can guaranteeEven
respectively. random andBPNN
though imbalanced
took
samples.
samples.
more execution time, it can guarantee random and imbalanced samples.

Figure 14.14. Firststage


stage classification:Box-plot
Box-plot distributions of mean performance measurements for
Figure 14.First
Figure First stageclassification: distributions
classification: Box-plot distributionsof mean performance
of mean measurements
performance for five
measurements for
five alternative
alternative classifiers
classifiers using
usingusing repeated
repeated random
random trials.
trials.trials.
five alternative classifiers repeated random

Figure 15. Second


Figure 15. Second stage
stage classification:
classification: Box-plot
Box-plot distributions
distributions of
of mean
mean performance
performance measurements
measurements for
for
Figure
five 15. Second
alternative stage classification:
classifiers using repeatedBox-plot
random distributions
trials. of mean performance measurements for
five alternative classifiers using repeated random trials.
five alternative classifiers using repeated random trials.
As an example, the outputs of our proposed CAD are illustrated in Figure 16. They demonstrate
3D visualizations of tumors of different stages. Figure 16a represents a stage T1a tumor that is 1.83 cm
in its greatest dimension and is located in the left lower lobe in an isolated manner. Figure 16b
demonstrate a stage T1b tumor which is also located in the left lower lob, but is bigger in size (2.14 cm)
than the tumor in Figure 16a.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 22 of 26
Appl. Sci. 2019, 9, 2329 22 of 26

Stage -T1a Stage -T1b Stage –T2a

(a) (b) (c)

Stage T3
Stage T2 b (Invade visceral pleura) Stage T3 (Invade chest wall)
(Satellite nodules in the same lobes)

(d) (e) (f)


Stage T4
Stage T4 (Invade mediastinum)
(Satellite nodules in different lobes)

(g) (h)

Figure16.
Figure 16. Three-dimensional
Three-dimensionalvisualization
visualization for some example
for some outputsoutputs
example of the proposed CAD: (a) T1a-
of the proposed CAD:
(a)staged tumor
T1a-staged nodule,
tumor (b) T1b-staged
nodule, (b) T1b-staged tumor nodule,
tumor (c) T1b-staged
nodule, (c) T1b-stagedtumor nodule,
tumor (d)(d)
nodule, T2b-staged
T2b-staged
tumor
tumor nodulewhich
nodule whichinvades
invades the
the pleura,
pleura,(e) (e)T3-staged
T3-staged tumor
tumornodule
nodulewhich invades
which the chest
invades wall, (f)
the chest wall,
T3-staged tumor nodule with satellite nodules in the same lobe, (g) T4-staged
(f) T3-staged tumor nodule with satellite nodules in the same lobe, (g) T4-staged tumor nodule tumor nodule withwith
satellite
satellite nodules
nodules inindifferent
differentlobes,
lobes,(h)
(h)T4-staged
T4-staged tumor
tumor nodule
nodule which
whichinvades
invadesthe
thediaphragm.
diaphragm.

As an
Then, example,
Figure 16c,dtherepresents
outputs ofstage
our proposed
T2a andCAD are illustrated
T2b tumors, in FigureBoth
respectively. 16. They demonstrate
tumors are located
3D visualizations of tumors of different stages. Figure 16(a) represents a
in the right upper lobes. The tumor in Figure 16c is 3.05 cm in its greatest dimension and stage T1a tumor that
is is 1.83
isolated,
cm in its greatest dimension and is located in the left lower lobe in an isolated
whereas the tumor in Figure 16d is 2.10 cm and touches the visceral pleural. Hence, the tumor in manner. Figure 16(b)
demonstrate
Figure 16d has aa stage
higherT1b tumor
stage evenwhich
thoughis also locatedin
is smaller insize.
the left lower lob, but is bigger in size (2.14
cm) than the tumor in Figure 16(a).
Figure 16e,f illustrates the stage T3 tumors. The tumor in Figure 16e is located in the right upper
Then, Figures 16(c) and (d) represent stage T2a and T2b tumors, respectively. Both tumors are
lobe and invades the chest wall. Also, Figure 16f demonstrates the presence of satellite nodules in the
located in the right upper lobes. The tumor in Figure 16(c) is 3.05 cm in its greatest dimension and is
same lobe, the right upper lobe. Unlike Figure 16f, the tumors in Figure 16g contain satellite nodules
isolated, whereas the tumor in Figure 16(d) is 2.10 cm and touches the visceral pleural. Hence, the
in different lobes: one is in the right middle lobe, and another is in the left upper lobe. Thus, it is
tumor in Figure 16(d) has a higher stage even though is smaller in size.
classified as stage T4. Finally, in Figure 16h, we can see a T4-staged tumor nodule which invades
Figures 16(e) and (f) illustrate the stage T3 tumors. The tumor in Figure 16(e) is located in the
theright
mediastinum.
upper lobe and invades the chest wall. Also, Figure 16(f) demonstrates the presence of satellite
Moreover,
nodules in the we
samecompared the performance
lobe, the right of our
upper lobe. Unlike proposed
Figure CAD
16(f), the withinmethods
tumors presented
Figure 16(g) contain in
previous
satelliteworks,
nodulesasinstated in Table
different lobes:7.one
It was
is in difficult
the right to makelobe,
middle a fairand
comparison
another is because
in the leftthe CADs
upper
were developed using different datasets, different numbers of samples, and different methods. Thus,
we could only make a quantitative comparison of the performance outcomes. Based on these
quantitative comparative results, it is evident that our proposed CAD is validated using not only
Appl. Sci. 2019, 9, 2329 23 of 26

higher amount, but also random samples, and it can provide preferable performance in both detection
and staging.

Table 7. Quantitative performance comparison of the proposed CAD method with methods presented
in previous works.

CADs Methods Dataset Samples Accuracy

• Gabor filter
• Marker-controlled watershed
• NIHINCI Lung Image
[10] • Geometric features (area, 40 Not described
Consortium (LIDC)
perimeter, eccentricity)
• SVM

• Sharpening
• Marker-controlled watershed
• Area, perimeter, eccentricity, • Regional Cancer
[11] convex area, mean intensity Centre Trivandrum 200 94.4%
• Thresholding
• Random tree

• Median filter
• Local dataset (JPEG format)
• Watershed transform Detection—97%
[12] • UCI lung dataset (for 500
• Textural features Staging—87%
training SVM)
• Multi-class SVM classifier

• 3D bounding boxes (manual) • Institutional dataset


[14] • CNN (FDG-PET)/CT 472 Staging—90%

• K-means clustering
[15] • LONI dataset 95 Detection—99.6%
• Double CDNN

• HU-based thresholding
• LIDC-IDRI
• 3D nodule features (geometric,
• NSCLC Radiomics-Genomics Detection—92.8%
Proposed method textural, intensity) and 1560
• NSCLC-Radiomics Staging—90.6%
locational features
• NSCLC Radiogenomics
• Double-staged BPNN

5. Discussion and Conclusions


This paper presents a CAD capable of the detection and staging of lung tumors from computed
tomography (CT) scans. Our CAD has two major contributions. The first contribution is the extraction
of locational features from tumor nodules for the T-staging of lung cancer. The second is the double
use of the BPNN classifier for the detection and staging of the tumor nodules.
The CAD works in three pivotal phases called segmentation, detection, and staging. A single
CT exam of a patient containing multiple 2D slices is an input of the proposed CAD. From each
input exam, the lung anatomical structures, such as the lung regions, trachea, bronchial airway tree,
pulmonary vessels, fissures, lobes, and nodule candidates, are initially segmented using gray-level
thresholding and isotropic interpolation methods. Subsequently, 28 nodule features are extracted
from the segmented nodule candidates to detect the true tumor nodules. These features include
four geometric, five intensity, and nineteen texture features. Using these features, the first stage of
classification is performed and the true tumor nodules are segregated.
Once the true tumor nodules have been achieved, the locational features are added for staging.
With the help of the segmented lung anatomical structures from the first phase, the locational features
can be defined by dividing the lungs into different zones. Moreover, the invasion and attachment of the
tumor nodules to other pulmonary organs are also checked by using the 3D quick hull algorithm and
logical array checking. As well as this, the number of true tumor nodules predicted in a single test and
their lobe locations are also extracted in this stage. Using these features, the staging of tumor nodules
Appl. Sci. 2019, 9, 2329 24 of 26

is conducted by second-stage classification. In this study, we applied the BPNN for both classifications.
Before selecting the BPNN, we tested and compared it with five alternative classifiers using repeated
random trials. Based on the comparative results, we determined that the BPNN should be used due to
its high performance and robustness.
Our proposed CAD was developed and tested using 1560 CT exams from four popular public
datasets: LIDC-IDRI, NSCLC-Radiomics-Genomics, NSCLC-Radiomics and NSCLC Radiogenomics.
In the performance evaluation, our proposed CAD achieved a detection accuracy of 92.8%, a sensitivity
of 93.6%, a specificity of 91.8%, a precision of 91.8%, an F-score of 92.8%, and an AUC of 96.8%. For the
staging, we achieved an accuracy of 90.6%, a sensitivity of 77.4%, a specificity of 97.4%, a precision
of 93.8%, an F1-score of 79.6% and an AUC of 84.6% respectively. Compared with previous works,
our CAD was evaluated with more samples and obtained auspicious performances.
Unlike most existing CADs for lung cancer diagnosis, our CAD makes possible not only the
detection, but also the staging of lung tumor nodules. Our first contribution, the extraction of locational
features, has advantages for the staging of tumor nodules, because the severity of lung cancer is
strongly associated with the size and location of tumor nodules. Knowing the exact location of tumors
reflects accurate staging. Patient survival rates can also be increased by providing precise treatment
options depending on the tumor stage. Moreover, locational features produced by our CAD will be
useful for the clinical surgery planning of seriously staged tumors. The next contribution, the double
stage of classification using BPNN, makes the CDA more robust. As the first stage classification
filters only the true tumor nodules, we do not need to extract the locational features from all nodule
candidates; this saves both computational effort and time. Nonetheless, our study has some limitations
in terms of the need for background knowledge for anatomical lung structures. Moreover, our CAD
only focuses on the T-staging of lung cancer. Thus, there are some open issues which can be further
upgraded for the N and M-staging of lung cancer.

Supplementary Materials: This paper uses four popular public datasets namely LIDC-IDRI,
NSCLC-Radiomics-Genomics, NSCLC-Radiomics and NSCLC Radiogenomics. They are available online at
https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI, https://wiki.cancerimagingarchive.net/display/
Public/NSCLC-Radiomics-Genomics, https://wiki.cancerimagingarchive.net/display/Public/NSCLC-Radiomics,
https://wiki.cancerimagingarchive.net/display/Public/NSCLC+Radiogenomicsrespectively.
Author Contributions: M.P.P. and C.P. conceived and designed this study. M.P.P. performed the experiments,
simulations, and original draft preparation of the paper. C.P., S.T. and K.H. reviewed and edited the paper.
Funding: This research received no external funding.
Acknowledgments: The authors express sincere gratitude to the ASEAN University Network/Southeast Asia
Engineering Education Development Network (AUN/SEED-Net) and the Japan International Cooperation Agency
(JICA) for financial support. We would also like to thank The Cancer Imaging Archive (TCIA) for publically
sharing the CT scan images and clinical data which were applied in our experiments.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Bray, F.; Ferlay, J.; Soerjomataram, I.; Siegel, R.L.; Torre, L.A.; Jemal, A. Global cancer statistics 2018:
GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J.
Clin. 2018, 68, 394–424. [CrossRef] [PubMed]
2. Aerts, H.J.W.L.; Velazquez, E.R.; Leijenaar, R.T.H.; Parmar, C.; Grossmann, P.; Carvalho, S.; Bussink, J.;
Monshouwer, R.; Haibe-Kains, B.; Rietveld, D.; et al. Decoding tumour phenotype by noninvasive imaging
using a quantitative radiomics approach. Nat. Commun. 2014, 5, 4006. [CrossRef] [PubMed]
3. The National Lung Screening Trial Research Team. Reduced Lung-Cancer Mortality with Low-Dose
Computed Tomographic Screening. N. Engl. J. Med. 2011, 365, 395–409. [CrossRef] [PubMed]
4. Xie, H.; Yang, D.; Sun, N.; Chen, Z.; Zhang, Y. Automated pulmonary nodule detection in CT images using
deep convolutional neural networks. Pattern Recognit. 2019, 85, 109–119. [CrossRef]
5. Beigelman-Aubry, C.; Hill, C.; Grenier, P.A. Management of an incidentally discovered pulmonary nodule.
Eur. Radiol. 2007, 17, 449–466. [CrossRef] [PubMed]
Appl. Sci. 2019, 9, 2329 25 of 26

6. Heuvelmans, M.A.; Walter, J.E.; Peters, R.B.; de Bock, G.H.; Yousaf-Khan, U.; van der Aalst, C.M.; Groen, H.J.;
Nackaerts, K.; van Ooijen, P.M.; de Koning, H.J.; et al. Relationship between nodule count and lung
cancer probability in baseline CT lung cancer screening: The NELSON study. Lung Cancer 2017, 113, 45–50.
[CrossRef] [PubMed]
7. Weissferdt, A.; Moran, C.A. Diagnostic Pathology of Pleuropulmonary Neoplasia; Springer: New York, NY, USA,
2013; ISBN 978-1-4419-0786-8.
8. Oda, H.; Bhatia, K.K.; Oda, M.; Kitasaka, T.; Iwano, S.; Homma, H.; Takabatake, H.; Mori, M.; Natori, H.;
Schnabel, J.A.; et al. Automated mediastinal lymph node detection from CT volumes based on intensity
targeted radial structure tensor analysis. J. Med. Imaging 2017, 4, 044502. [CrossRef]
9. Loutfi, S.; Khankan, A.; Ghanim, S.A. Guidelines for multimodality radiological staging of lung cancer.
J. Infect. Public Health 2012, 5, S14–S21. [CrossRef]
10. Kulkarni, A.; Panditrao, A. Classification of lung cancer stages on CT scan images using image processing.
In Proceedings of the 2014 IEEE International Conference on Advanced Communications, Control and
Computing Technologies, Ramanathapuram, India, 8–10 May 2014; IEEE: Ramanathapuram, India, 2014;
pp. 1384–1388.
11. Ignatious, S.; Joseph, R.; John, J.; Prahladan, A. Computer Aided Lung Cancer Detection and Tumor Staging
in CT image using Image Processing. Int. J. Comput. Appl. 2015, 128, 29–33. [CrossRef]
12. Alam, J.; Alam, S.; Hossan, A. Multi-Stage Lung Cancer Detection and Prediction Using Multi-class SVM
Classifie. In Proceedings of the 2018 International Conference on Computer, Communication, Chemical,
Material and Electronic Engineering (IC4ME2), Rajshahi, Bangladesh, 8–9 February 2018; IEEE: Rajshahi,
Bangladesh, 2018; pp. 1–4.
13. Jiao, Z.; Gao, X.; Wang, Y.; Li, J.; Xu, H. Deep Convolutional Neural Networks for mental load classification
based on EEG data. Pattern Recognit. 2018, 76, 582–595. [CrossRef]
14. Kirienko, M.; Sollini, M.; Silvestri, G.; Mognetti, S.; Voulaz, E.; Antunovic, L.; Rossi, A.; Antiga, L.;
Chiti, A. Convolutional Neural Networks Promising in Lung Cancer T-Parameter Assessment on Baseline
FDG-PET/CT. Contrast Media Mol. Imaging 2018, 2018, 1382309. [CrossRef]
15. Jakimovski, G.; Davcev, D. Using Double Convolution Neural Network for Lung Cancer Stage Detection.
Appl. Sci. 2019, 9, 427. [CrossRef]
16. Sylvan, E. CT-Based Measurement of Lung Volume and Attenuation of Deceased. Master’s Thesis, Linköpings
Universisty, Linköping, Sweden, 23 September 2005.
17. Zhou, X.; Hayashi, T.; Hara, T.; Fujita, H.; Yokoyama, R.; Kiryu, T.; Hoshi, H. Automatic segmentation and
recognition of anatomical lung structures from high-resolution chest CT images. Comput. Med. Imaging
Graph. 2006, 30, 299–313. [CrossRef] [PubMed]
18. Walker, C.M.; Rosado-de-Christenson, M.L.; Martínez-Jiménez, S.; Kunin, J.R.; Wible, B.C. Bronchial Arteries:
Anatomy, Function, Hypertrophy, and Anomalies. RadioGraphics 2015, 35, 32–49. [CrossRef] [PubMed]
19. Drummond, G.B. Computed tomography and pulmonary measurements. Br. J. Anaesth. 1998, 80, 665–671.
[CrossRef] [PubMed]
20. Özsavaş, E.E.; Telatar, Z.; Dirican, B.; Sağer, Ö.; Beyzadeoğlu, M. Automatic Segmentation of Anatomical
Structures from CT Scans of Thorax for RTP. Comput. Math. Methods Med. 2014, 2014, 472890. [CrossRef]
[PubMed]
21. Orkisz, M.; Hernández Hoyos, M.; Pérez Romanello, V.; Pérez Romanello, C.; Prieto, J.C.; Revol-Muller, C.
Segmentation of the pulmonary vascular trees in 3D CT images using variational region-growing. IRBM
2014, 35, 11–19. [CrossRef]
22. Ko, J.P.; Marcus, R.; Bomsztyk, E.; Babb, J.S.; Kaur, M.; Naidich, D.P.; Rusinek, H. Effect of Blood Vessels on
Measurement of Nodule Volume in a Chest Phantom. Radiology 2006, 239, 79–85. [CrossRef] [PubMed]
23. Frangi, A.F.; Niessen, W.J.; Vincken, K.L.; Viergever, M.A. Multiscale vessel enhancement filtering. In Medical
Image Computing and Computer-Assisted Intervention—MICCAI’98; Wells, W.M., Colchester, A., Delp, S., Eds.;
Springer: Berlin/Heidelberg, Germany, 1998; Volume 1496, pp. 130–137. ISBN 978-3-540-65136-9.
24. Goncalves, L.; Novo, J.; Campilho, A. Feature definition, analysis and selection for lung nodule classification
in chest computerized tomography images. In Proceedings of the 2016 European Symposium on Artificial
Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 27–29 April 2016.
25. Haralick, R.M.; Shanmugam, K.; Dinstein, I. Textural Features for Image Classification. IEEE Trans. Syst.
Man Cybern. 1973, SMC-3, 610–621. [CrossRef]
Appl. Sci. 2019, 9, 2329 26 of 26

26. Dhara, A.K.; Mukhopadhyay, S.; Dutta, A.; Garg, M.; Khandelwal, N. A Combination of Shape and Texture
Features for Classification of Pulmonary Nodules in Lung CT Images. J. Digit. Imaging 2016, 29, 466–475.
[CrossRef] [PubMed]
27. Kollmannsberger, P.; Kerschnitzki, M.; Repp, F.; Wagermaier, W.; Weinkamer, R.; Fratzl, P. The small world
of osteocytes: Connectomics of the lacuno-canalicular network in bone. New J. Phys. 2017, 19, 073019.
[CrossRef]
28. Barber, C.B.; Dobkin, D.P.; Huhdanpaa, H. The quickhull algorithm for convex hulls. ACM Trans. Math.
Softw. 1996, 22, 469–483. [CrossRef]
29. Armato, S.G.; McLennan, G.; Bidaut, L.; McNitt-Gray, M.F.; Meyer, C.R.; Reeves, A.P.; Clarke, L.P. Data
From LIDC-IDRI. The Cancer Imaging Archive. 2015. Available online: https://wiki.cancerimagingarchive.net/
display/Public/LIDC-IDRI#b858dd6a871345b6a98471efa02052a8 (accessed on 5 June 2019).
30. Armato, S.G.; McLennan, G.; Bidaut, L.; McNitt-Gray, M.F.; Meyer, C.R.; Reeves, A.P.; Zhao, B.; Aberle, D.R.;
Henschke, C.I.; Hoffman, E.A.; et al. The Lung Image Database Consortium (LIDC) and Image Database
Resource Initiative (IDRI): A Completed Reference Database of Lung Nodules on CT Scans. Med Phys. 2011,
38, 915–931. [CrossRef] [PubMed]
31. Aerts, H.J.; Rios Velazquez, E.; Leijenaar, R.T.; Parmar, C.; Grossmann, P.; Carvalho, S.; Lambin, P. Data
from NSCLC-Radiomics-Genomics. The Cancer Imaging Archive, 2015. Available online: https://wiki.
cancerimagingarchive.net/display/Public/NSCLC-Radiomics-Genomics#a9c4326b5df441359a1d0c2bab5e612f
(accessed on 5 June 2019).
32. Aerts, H.J.; Rios Velazquez, E.; Leijenaar, R.T.; Parmar, C.; Grossmann, P.; Carvalho, S.; Lambin, P. Data from
NSCLC-Radiomics. The Cancer Imaging Archive, 2015. Available online: https://wiki.cancerimagingarchive.
net/display/Public/NSCLC-Radiomics#3946d6a325924b00a3b127ae41205ab7 (accessed on 5 June 2019).
33. Bakr, S.; Gevaert, O.; Echegaray, S.; Ayers, K.; Zhou, M.; Shafiq, M.; Zheng, H.; Zhang, W.;
Leung, A.; Kadoch, M.; et al. Data for NSCLC Radiogenomics Collection. The Cancer Imaging Archive.
2017. Available online: https://wiki.cancerimagingarchive.net/display/Public/NSCLC+Radiogenomics#
6da0546d416d44b3be8e9e6bd7b95b2b (accessed on 5 June 2019).
34. Gevaert, O.; Xu, J.; Hoang, C.D.; Leung, A.N.; Xu, Y.; Quon, A.; Rubin, D.L.; Napel, S.; Plevritis, S.K.
Non–Small Cell Lung Cancer: Identifying Prognostic Imaging Biomarkers by Leveraging Public Gene
Expression Microarray Data—Methods and Preliminary Results. Radiology 2012, 264, 387–396. [CrossRef]
[PubMed]
35. Clark, K.; Vendt, B.; Smith, K.; Freymann, J.; Kirby, J.; Koppel, P.; Moore, S.; Phillips, S.; Maffitt, D.;
Pringle, M.; et al. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information
Repository. J. Digit. Imaging 2013, 26, 1045–1057. [CrossRef] [PubMed]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

You might also like