Professional Documents
Culture Documents
DEEP LEARNING
A PROJECT REPORT
Submitted by ,
DIVAGAR. L 721917106021
SATHYA. H 721917106075
THILAKESH. A 721917106090
of
BACHELOR OF ENGINEERING
in
APRIL 2021
ANNA UNIVERSITY : CHENNAI 600 025
BONAFIDE CERTIFICATE
__________________ ____________________
SIGNATURE SIGNATURE
Mr. S.MUKUNTHAN M.TECH Mrs.S.G. RAMA PRIYANGA, M.E
----------------------- ---------------------------
INTERNAL EXAMINER EXTERNAL EXAMINER
ACKNOWLEDGEMENT
beloved principal Dr. P. MALATHI, M.E, Ph.D, for his valuable support and
We express heartfelt thanks to our parents and friends for their support
throughout our career. We would like to thank everyone who had helped us
directly and indirectly in this project work. We thank the Lord Almighty.
ABSTRACT
Lung diseases are the disorders that affect the lungs, which
assists in the inhalation process. Lung cancer is one of the common causes of
death among people throughout the world. Early detection of lung cancer can
increase the chance of survival among people.
The overall survival rate for lung cancer patients increases from 14 to
49 % if the disease is detected in time.
Although Computed Tomography (CT) is more efficient than X-ray.
Generally, it requires multiple imaging methods to complement each other to
obtain a comprehensive diagnosis.
In this work, a deep neural network is modelled to identify lung cancer
from CT images has been proposed.
A densely connected convolutional neural network(Dense Net) and
adaptive boosting algorithm is used to classify the lung as normal or
malignant.
A dataset of 201 lung images is used in which 85% of the images are
used for training and 15% of the images are used for testing and classification.
Experimental results show that the proposed method has achieved an
accuracy of 90.85%.
i
LIST OF CONTENTS
1 INTRODUCTION 1
1.1 Prediction 1
1.2 Staging 2
1.3 Survey 3
2 LITERATURE SURVEY 4
2.1 Lung cancer detection &
Classification using deep learning 4
2.2 Detection and classification of lung
Abnormalities by use of convolution
Neural network (CNN) and regions with
CNN feature(R-CNN) 4
2.3 Detection and classification of
Pulmonary nodules using convolutional
Neural network 5
2.4 Multiple resolution residually connected
Feature streams ,Automatic lung tumor
Segmentation from CT image 6
2.5 Lung image pulse classification
With Automatic feature learning 7
ii
3 SYSTEM REQUIREMENTS 8
3.1 Software Used 8
3.2 Language 8
3.3 Software Details 8
3.4 Python Modeling and Simulation 12
3.5 Python Advantages 13
3.6 Python Application 15
6 MODULE DESCRIPTION 27
6.1 Pre-processing 27
6.2 Feature Selection 28
6.3 Feature Extraction 29
6.4 CNN Layers 31
6.5 Data Augmentation 37
6.6 Ada boost Algorithm 38
iii
7 RESULT AND ANALYSIS 41
7.1 Training Part Result 41
7.2 Classification Part Result 43
8 CONCLUSION 44
REFERENCE 45
iv
LIST OF FIGURES
v
LIST OF ABBREVIATION
ACRONYM ABBREVIATION
ADABOOST Adaptive Boosting
AI Artificial Intelligence
API Application Programming
Interface
CAD Computer Aided Diagnosis
CNN Convolution Neural Network
CT Computed Tomography
EDA Exploration Data Analysis
GLCM Gray-level Co-occurrence Matrix
HU House Field Unit
IDRI Image Database Resource Initiative
LIDC Lung Image Database Consortium
LUNA 16 Lung Nodule Analysis 2016
MRI Magnetic Resource Imaging
MRRN Multiple Resolution Residually
Connected Network
NSCLC Non Small Cell Lung Cancer
PyPI Python Package Index
ROI Region of Interest
vi
CHAPTER-1
INTRODUCTION
Lung cancer has become one of the most common causes of death in the
world . It is one of the most harmful malignant tumors to human health. Its
mortality rate ranks first among malignant tumor deaths and is the number one
killer of cancer deaths among men and women worldwide . There are about 1.8
million new cases of lung cancer per year (13% of all tumors), 1.6 million
deaths (19.4% of all tumors) in the world [4], and 5-year survival rate is only
18% . Lung cancer is a disease of abnormal cells multiplying and growing into a
tumor. The mortality rate of lung cancer is the highest among all other types
cancer. An estimated 85 percent of lung cancer cases in males and 75 percent in
females are caused by cigarette smoking. Lung cancer is one of the most
dreadful diseases in the developing countries and its mortality rate is 19.4% .
1.1. PREDICTION
Lung cancer is one of the most serious cancers in the world, with the
smallest survival rate after the diagnosis, with a gradual increase in the number
of deaths every year. Survival from lung cancer is directly related to its growth
at its detection time. But people do have a higher chance of survival if the
cancer can be detected in the early stages. Cancer cells can be carried away
from the lungs in blood, or lymph fluid that surrounds lung tissue. Lymph flows
through lymphatic vessels, which drain into lymph nodes located in the lungs
and in the centre chest. Lung cancer is one of the most killer diseases in the
developing countries and the detection of the cancer at the early stage is a
1
challenge. Analysis and cure of lung malignancy have been one of the greatest
difficulties faced by humans over the most recent couple of decades. Early
identification of tumor would facilitate in sparing a huge number of lives over
the globe consistently. This paper presents an approach which utilizes a
Convolutional Neural Network (CNN) to classify the tumors found in lung as
malignant or benign. The accuracy obtained by mea
1.2. STAGING
Lung cancer often spreads toward the center of the chest because the
natural flow of lymph out of the lungs is toward the center of the chest. Lung
cancer can be divided into two main groups, non-small cell lung cancer and
small cell lung cancer. These assigned of the lung cancer types are depends on
their cellular characteristicsStaging is based on tumor size and lymph node
location. Presently, CT are said to be more effective than plain chest x-ray in
detecting and diagnosing the lung cancer. Early detection of lung tumor is done
by using many imaging techniques such as Computed Tomography (CT),
Sputum Cytology, Chest X-ray and Magnetic Resonance Imaging (MRI).
Detection means classifying tumor two classes (i)non-cancerous tumor (benign)
and (ii)cancerous tumor (malignant). The chance of survival at the advanced
stage is less when compared to the treatment and lifestyle to survive cancer
therapy when diagnosed at the early stage of the cancer. Manual analysis and
diagnosis system can be greatly improved with the implementation of image
processing techniques. A number of researches on the image processing
techniques to detect the early stage cancer detection are available in the
literature. But the hit ratio of early stage detection of cancer is not greatly
2
improved. With the advancement in the machine learning techniques, the early
diagnosis of the cancer is attempted by lot of researchers. Neural network plays
a key role in the recognition of the cancer cells among the normal tissues, which
in turn provides an effective tool for building an assistive AI based cancer
detection. The cancer treatment will be effective only when the tumor cells are
accurately separated from the normal cells Classification of the tumor cells and
training of the neural network forms the basis for the machine learning based
cancer diagnosis .
1.3. SURVEY
Lung cancer has become one of the most significant diseases in human
history. The World Health Organization estimates the worldwide death toll from
lung cancer will be 10,000,000 by 2030. The 5-year survival rate for advanced
Non Small Cell Lung Cancer (NSCLC) remains disappointingly low. It has
been hypothesized that quantitative image feature analysis can improve
diagnostic/prognostic or predictive accuracy, and therefore will have an impact
on a significant number of patients . In the current study, standard-of-care
clinical computed tomography (CT) scans were used for image feature
extraction. In order to reduce variability for feature extraction, the first and
essential step is to accurately delineate the lung tumors. Accurate delineation of
lung tumors is also crucial for optimal radiation oncology. A common approach
to delineate tumor from CT scans involves radiologists or radiation oncologists
manually drawing the boundary of the tumor. In the majority of cases, manual
segmentation overestimates the lesion volume to ensure the entire lesion is
identified and the process is highly variable A stable accurate segmentation is
critical, as image features (such as texture and shape related features) are
sensitive to small tumor boundary changes.
3
CHAPTER -2
LITERATURE SURVEY
DRAWBACK
Due to the lack of strict clinical guidelines and the resemblance between
the different ILD findings determines the problem of radiological.
4
2.2. TITLE: Detection and classification of lung abnormalities by use of
convolutional neural network (CNN) and regions with CNN
features (R-CNN) - 2018
AUTHORS :Shoji Kido,Yasushi Hirano, Noriaki Hashimoto.
DRAWBACK
5
limited and they have been overworked. Recently, numerous methods,
especially ones based on deep learning with convolutional neural network
(CNN), have been developed to automatically detect and classify pulmonary
nodules in medical images. In this paper, we present a comprehensive analysis
of these methods and their performances. First, we briefly introduce the
fundamental knowledge of CNN as well as the reasons for their suitability to
medical images analysis.
6
2.5. TITLE: Lung Image Patch Classification with Automatic Feature
Learning-2015
7
CHAPTER -3
SYSTEM REQUIREMENTS
SOFTWARE : PYTHON
3.2. LANGUAGE
8
library has two modules (itertools and functools) that implement functional
tools borrowed from Haskell and Standard ML.
Rather than having all of its functionality built into its core, Python was
designed to be highly extensible. This compact modularity has made it
particularly popular as a means of adding programmable interfaces to existing
9
applications. Van Rossum's vision of a small core language with a large
standard library and easily extensible interpreter stemmed from his frustrations
with ABC, which espoused the opposite approach. Python strives for a simpler,
less-cluttered syntax and grammar while giving developers a choice in their
coding methodology. In contrast to Perl's "there is more than one way to do it"
motto, Python embraces a "there should be one—and preferably only one—
obvious way to do it" design philosophy. Alex Martelli, a Fellow at the Python
Software Foundation and Python book author, writes that "To describe
something as 'clever' is not considered a compliment in the Python culture.
10
unpythonic. Users and admirers of Python, especially those considered
knowledgeable or experienced, are often referred to as Pythonistas.
11
3.4. PYTHON MODELING AND SIMULATION
• The first part presents discrete models, including a bike share system and
world population growth.
• The second part introduces first-order systems, including models of
infectious disease, thermal systems, and pharma co kinetics.
• The third part is about second-order systems, including mechanical
systems like projectiles, celestial mechanics, and rotating rigid bodies.
12
tunnels). Simulations can be performed “as fast as possible”, in real time (wall
clock time) or by manually stepping through the events. Though it is
theoretically possible to do continuous simulations with SimPy, it has no
features that help you with that. On the other hand, SimPy is overkill for
simulations with a fixed step size where your processes don’t interact with each
other or with shared resources.
Python provides a large standard library which includes areas like internet
protocols, string operations, web services tools and operating system interfaces.
Many high use programming tasks have already been scripted into the standard
library which reduces length of code to be written significantly.
13
purposes. Further, its development is driven by the community which
collaborates for its code through hosting conferences and mailing lists, and
provides for its numerous modules.
Python has built-in list and dictionary data structures which can be used
to construct fast runtime data structures. Further, Python also provides the
option of dynamic high-level data typing which reduces the length of support
code that is needed.
14
3.7. PYTHON APPLICATIONS
• Self-Paced Learning
• Real-life Case Studies
15
• Assignments
• Lifetime Access
16
applications such as Calculators, To-Do apps and go ahead and create much
more complicated applications.
Python has a library called Beautiful Soup which can be used to pull such
data and be used accordingly. Here’s a full-fledged guide to learn Web scraping
with Python.
17
CHAPTER-4
Roy, Sirohi, and Patle developed a system to detect lung cancer nodule
using fuzzy interference system and active contour model. This system uses gray
transformation for image contrast enhancement. Image binarization is performed
before segmentation and resulted image is segmented using active contour model.
Cancer classification is performed using fuzzy inference method. Features like
area, mean, entropy, correlation, major axis length, minor axis length are
extracted to train the classifier. Overall, accuracy of the system is 94.12%.
Counting its limitation it does not classify the cancer as benign or malignant
which is future scope of this proposed model. Ignatious and Joseph [8] developed
a system using watershed segmentation. In pre processing it uses Gabor filter to
enhance the image quality. It compares the accuracy with neural fuzzy model and
region growing method. Accuracy of the proposed is 90.1% which is
comparatively higher than the model with segmentation using neural fuzzy model
and region growing method. The advantage of this model is that it uses marker
controlled watershed segmentation which solves over segmentation problem. As
a limitation it does not classify the cancer as benign or malignant and accuracy is
high but still not satisfactory. Some changes and contribution in this model has
probability of increasing the accuracy to satisfactory level .
4.2 LIMITATIONS
18
area, eccentricity, circularity, fractal dimension and textural features like mean,
variance, energy, entropy, skewness, contrast, and smoothness are extracted to
train and classify the support vector machine to identify whether the nodule is
benign or malignant. The advantage of this model is that it classifies cancer as
benign or malignant, however the limitation of it is that prior information is
required about region of interest.
19
nodules. The best model ends after the detection of cancer nodule, it’s feature
extraction and calculation of accuracy. But, its classification as benign or
malignant has not been implemented. Therefore, additional stage of classification
of cancer nodule has been performed using Support Vector Machine. Extracted
features are used as training features and trained model is generated. Then,
unknown detected cancer nodule is classified using that trained prediction model.
3.2 Image Preprocessing Firstly, in image pre-processing median filter is used on
grayscale image of CT scan images. Some noises are embedded on CT Images at
the time of image acquisition process which aids in false detection of nodules.
Noise Suren Makaju et al. / Procedia Computer Science 125 (2018) 107–
11410921. Introduction Lung cancer is one of the causes of cancer deaths. It is
difficult to detect because it arises and shows symptoms in final stage. However,
mortality rate and probability can be reduced by early detection and treatment of
the disease. Best imaging technique CT imaging are reliable for lung cancer
diagnosis because it can disclose every suspected and unsuspected lung cancer
nodules . However, variance of intensity in CT scan images and anatomical
structure misjudgment by doctors and radiologists might cause difficulty in
marking the cancerous cell . Recently, to assist radiologists and doctors detect the
cancer accurately computer Aided Diagnosis has become supplement and
promising tool [3].
There has been many system developed and research going on detection of
lung cancer. However, some systems do not have satisfactory accuracy of
detection and some systems still has to be improved to achieve highest accuracy
tending to 100%. Image processing techniques and machine learning techniques
has been implemented to detect and classify the lung cancer. We studied recent
systems developed for cancer detection based on CT scan images of lungs to
choose the recent best systems and analysis was conducted on them and new
20
model was proposed. 2. Literature Review Several researchers has proposed and
implemented detection of lung cancer using different approaches of image
processing and machine learning. Aggarwal, Furquan and Kalra proposed a
model that provides classification between nodules and normal lung anatomy
structure. The method extracts geometrical, statistical and gray level
characteristics. LDA is used as classifier and optimal thresholding for
segmentation. The system has 84% accuracy, 97.14% sensitivity and 53.33%
specificity.
Although the system detects the cancer nodule, its accuracy is still
unacceptable. No any machine learning techniques has been used to classify and
simple segmentation techniques is used. Therefore, combination of any of its
steps in our new model does not provide probability of improvement. Jin, Zhang
and Jin used convolution neural network as classifier in his CAD system to
detect the lung cancer.
The system has accuracy of about 90.7%. Image pre processing median
filter is used for noise removal which can be useful for our new model to remove
the noise and improve the accuracy. Roy, Sirohi, and Patle developed a system
to detect lung cancer nodule using fuzzy interference system and active contour
model. This system uses gray transformation for image contrast enhancement.
21
Image binarization is performed before segmentation and resulted image is
segmented using active contour model. Cancer classification is performed using
fuzzy inference method. Features like area, mean, entropy, correlation, major axis
length, minor axis length are extracted to train the classifier.
22
solution. In image pre processing it uses Gabor filter to enhance the image
and uses marker controlled watershed method for segmentation and detects
the cancer nodule.
23
CHAPTER-5
PROPOSED MODEL
Normal Abnormal
image image
24
5.3 WORKING
The project working is divided into two parts, ie training and testing
parts. A sample of 100 images are used where 60 images are used for training
and the remaining 40 images are used for testing.
The results obtained from training and testing part are fed into in
CNN layers where the images are classified and the output is obtained. The
algorithm used for classification is ADABOOST (Adaptive Boosting), where
accuracy calculation is of the images is done based on the sample weights of the
images.
25
5.4 ADVANTAGES
26
CHAPTER 6
MODULE DESCRIPTION
6.1 PRE-PROCESSING
Pre-processing refers to the transformations applied to our data before
feeding it to the algorithm. Data Preprocessing is a technique that is used to
convert the raw data into a clean data set. In other words, whenever the data is
gathered from different sources it is collected in raw format which is not
feasible for the analysis.
For achieving better results from the applied model in Machine Learning
projects the format of the data has to be in a proper manner. Some
specified Machine Learning model needs information in a specified
format, for example, Random Forest algorithm does not support null
values, therefore to execute random forest algorithm null values have to be
managed from the original raw data set . Another aspect is that data set
27
should be formatted in such a way that more than one Machine Learning
and Deep Learning algorithms are executed in one data set, and best out of
them is chosen.
Feature selection has four different approaches such as filter approach, wrapper
approach, embedded approach, and hybrid approach.
28
selected features in classification. Wrapper methods can give high classification
accuracy for particular classifiers.
2.Filter approach : A subset of features is selected by this approach without
using any learning algorithm. Higher-dimensional datasets use this method and
it is relatively faster than the wrapper-based approaches.
3.Embedded approach : The applied learning algorithms determine the
specificity of this approach and it selects the features during the process of
training the data set.
4.Hybrid approach : Both filter and wrapper-based methods are used in hybrid
approach. This approach first selects the possible optimal feature set which is
further tested by the wrapper approach. It hence uses the advantages of both
filter and wrapper-based approach.
If the number of features becomes similar (or even bigger!) than the number
of observations stored in a dataset then this can most likely lead to a Machine
Learning model suffering from over fitting. In order to avoid this type of
problem, it is necessary to apply either regularization or dimensionality
reduction techniques (Feature Extraction). In Machine Learning, the
dimensionality reduction of a dataset is equal to the number of variables used to
represent it.
Using Regularization could certainly help reduce the risk of over fitting, but
using instead Feature Extraction techniques can also lead to other types of
advantages such as:
• Accuracy improvements.
29
• Speed up in training.
31
Why ConvNets over Feed-Forward Neural Nets?
An image is nothing but a matrix of pixel values, right? So why not just
flatten the image (e.g. 3x3 image matrix into a 9x1 vector) and feed it to a Multi-
Level Perceptron for classification purposes? Uh.. not really.
32
There are four layered concepts we should understand in Convolutional Neural
Networks:
1. Convolution,
2. ReLu
3. Pooling and
4. Full Connectedness (Fully Connected Layer).
The convolutional layer is the core building block of a CNN. The layer's
parameters consist of a set of learnable filters (or kernels), which have a small
receptive field, but extend through the full depth of the input volume. During
the forward pass, each filter is convolved across the width and height of the
input volume, computing the dot product between the entries of the filter and
the input and producing a 2-dimensional activation map of that filter. As a
result, the network learns filters that activate when it detects some specific type
of feature at some spatial position in the input.
Stacking the activation maps for all filters along the depth dimension
forms the full output volume of the convolution layer. Every entry in the output
volume can thus also be interpreted as an output of a neuron that looks at a
small region in the input and shares parameters with neurons in the same
activation map.
33
a sparse local connectivity pattern between neurons of adjacent layers: each
neuron is connected to only a small region of the input volume.
34
6.4.3 Pooling Layer
The pooling layer operates upon each feature map separately to create a
new set of the same number of pooled feature maps. Pooling involves selecting
a pooling operation, much like a filter to be applied to feature maps. The size of
the pooling operation or filter is smaller than the size of the feature map;
specifically, it is almost always 2×2 pixels applied with a stride of 2 pixels .This
means that the pooling layer will always reduce the size of each feature map by
a factor of 2, e.g. each dimension is halved, reducing the number of pixels or
values in each feature map to one quarter the size. For example, a pooling layer
applied to a feature map of 6×6 (36 pixels) will result in an output pooled
feature map of 3×3 (9 pixels).
• Average Pooling: Calculate the average value for each patch on the feature
map.
35
• Maximum Pooling (or Max Pooling): Calculate the maximum value for each
patch of the feature map.
The result of using a pooling layer and creating down sampled or pooled feature
maps is a summarized version of the features detected in the input. They are
useful as small changes in the location of the feature in the input detected by the
convolutional layer will result in a pooled feature map with the feature in the
same location. This capability added by pooling is called the model’s invariance
to local translation.
The fully connected part of the CNN network goes through its own back
propagation process to determine the most accurate weights. Each neuron
receives weights that prioritize the most appropriate label. Finally, the neurons
“vote” on each of the labels, and the winner of that vote is the classification
decision.
36
Fig 6.4.1 Fully Connected Structure
37
2. Operations in data augmentation
The most commonly used operations are-
1. Rotation
2. Shearing
3. Zooming
4. Cropping
5. Flipping
6. Changing the brightness level
A weak classifier is prepared on the training data using the weighted samples.
Only binary classification problems are supported. So each decision stump
makes one decision on one input variable. And outputs a +1.0 or -1.0 value for
the first or second class value.
error = (correct – N) / N
• Basically, weak models are added sequentially, trained using the weighted
training data.
• Generally, the process continues until a pre-set number of weak learners
have been created.
39
• Once completed, you are left with a pool of weak learners each with a stage
value.
Predictions are made by calculating the weighted average of the weak
classifiers.
For a new input instance, each weak learner calculates a predicted value as
either +1.0 or -1.0. The predicted values are weighted by each weak learners
stage value. The prediction for the ensemble model is taken as a sum of the
weighted predictions. If the sum is positive, then the first class is predicted, if
negative the second class is predicted.
40
CHAPTER 7
RESULT AND ANALYSIS
41
Fig 7.1 Training Part Result
in PYTHON and the system is trained with sample data sets for the model to
understand and familiarize the lung cancer. A sample image has been fed as an
input to the trained model and the model at this stage is able to tell the presence
of cancer and locate the cancer spot in the sample image of a lung cancer. The
process involves the feeding the input image, preprocessing, feature extraction,
identifying the cancer spot and indicate the results to the user. In case of the
42
7.2 CLASSIFICATION PART RESULT
Lung cancer detection using the convolutional neural network which model by
the end to end learning, Some of the parameter used for training the model of
the neural network CNN has two layers such as 2 convolution layers and 2
The confusion matrix shows the true positive, true negative, false positive and
false negative. From the analysis true positive gives the correctly classified the
lung cancer images and false positive gives the misclassification of images
which means that the lung cancer is wrongly predicted as non-cancerous image.
43
CHAPTER 8
CONCLUSION
The training and testing of images are done where images are pre-
processed and feature selection and feature extraction of images are done. Once
training and testing part is done successfully, the CNN algorithm classifies the
input lung image either as normal or abnormal and the output will be displayed.
Hence, a Deep CNN network is used for the classification of lung images
for the detection of cancer.
44
REFERENCES
[4] S. Rattan, S. Kaur, N. Kansal, and J. Kaur, ‘‘An optimized lung cancer
classification system for computed tomography images,’’ in Proc. 4th Int. Conf.
Image Inf. Process. (ICIIP), Dec. 2017, pp. 1–6.
45
[9] K. He, X. Zhang, and S. Ren, ‘‘Deep residual learning for image
recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2016,
pp. 770–778.
46
[17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘Imagenet classifification
with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf. Process.
Syst., 2012, pp. 1097–1105.
[22] S. Kundu, R. Mitra, and S. Misra, ‘‘Squamous cell carcinoma lung with
progressive systemic sclerosis,’’ J. Assoc. Phys. India, vol. 60, no. 12, pp. 52–
54, 2012.
[23] D. Sharma and G. Jindal, ‘‘Computer aided diagnosis system for detection
of lung cancer in CT scan images,’’ Int. J. Comput. Elect. Eng., vol. 3, no. 5, pp.
714–718, Sep. 2011.
47
[25] X. D. Teng, ‘‘World Health Organization classifification of tumours,
pathology and genetics of tumours of the lung,’’ Zhonghua Bing LI Xue Za
Zhi/Chin. J.Pathol., vol. 34, no. 8, p. 544, 2005.
48