Deep Learning: Lung Cancer Detection Using

LUNG CANCER DETECTION USING
DEEP LEARNING
A PROJECT REPORT
Submitted by ,
DIVAGAR. L 721917106021
SATHYA. H 721917106075
THILAKESH. A 721917106090
WILSON SAMUEL. S 721917106095
in partial fulfillment for the award of the degree
of
BACHELOR OF ENGINEERING
in
ELECTRONICS AND COMMUNICATION ENGINEERIG
DHANALAKSHMI SRINIVASAN COLLEGE OF ENGINEERING ,
COIMBATORE – 641 105
ANNA UNIVERSITY : CHENNAI 600 025
APRIL 2021
ANNA UNIVERSITY : CHENNAI 600 025
BONAFIDE CERTIFICATE
Certified that this project report “ LUNG CANCER DETECTION USING
DEEP LEARINING ” is the bona fide work of
“DIVAGAR.L(721917106021), SATHYA.H (72191710675),
THILAKESH.A (721917106090), WILSON SAMUEL.S (721917106095) ”
who carried out the project work under my supervision.
__________________ ____________________
SIGNATURE SIGNATURE
Mr. S.MUKUNTHAN M.TECH Mrs.S.G. RAMA PRIYANGA, M.E
HEAD OF THE DEPARTMENT SUPERVISOR
Assistant Professor Assistant Professor
Department of Electronics and Department of Electronics and

Communication Engineering Communication Engineering
Dhanalakshmi Srinivasan College Dhanalakshmi Srinivasan College
of Engineering of Engineering
Coimbatore Coimbatore
Submitted for the University Viva-Voce Examination held

on…………………
----------------------- ---------------------------
INTERNAL EXAMINER EXTERNAL EXAMINER
ACKNOWLEDGEMENT
It is our privilege to express our gratitude to secretary Shri. P.
NEELRAJ and Director Dr. N. VINOTH, for giving us constant inspiration
and motivation to pursue this project work.
It gives us an immense pleasure to express our gratitude to our
beloved principal Dr. P. MALATHI, M.E, Ph.D, for his valuable support and
encouragement through the project work.
We wish to express our sincere thanks and gratitude to Mr. S.
MUKUNTHAN M.TECH Assistant Professor and Head, Department of
Electronics and Communication Engineering for providing invaluable insights
into the subject and helping us whenever possible.
We extended sincere thanks to Mrs.S.G. RAMA PRIYANGA , M.E
Internal guide, Assistant Professor Department of Electronics and
Communication Engineering for his valuable guidance, suggestion and for
successful completion of the project.
We wish to extend our thanks to all Teaching and non-teaching staff
members of the Department of Electronics and Communication Engineering for
their kind and patient help throughout the project work.
We express heartfelt thanks to our parents and friends for their support
throughout our career. We would like to thank everyone who had helped us
directly and indirectly in this project work. We thank the Lord Almighty.
ABSTRACT
Lung diseases are the disorders that affect the lungs, which
assists in the inhalation process. Lung cancer is one of the common causes of
death among people throughout the world. Early detection of lung cancer can
increase the chance of survival among people.
The overall survival rate for lung cancer patients increases from 14 to
49 % if the disease is detected in time.
Although Computed Tomography (CT) is more efficient than X-ray.
Generally, it requires multiple imaging methods to complement each other to
obtain a comprehensive diagnosis.
In this work, a deep neural network is modelled to identify lung cancer
from CT images has been proposed.
A densely connected convolutional neural network(Dense Net) and
adaptive boosting algorithm is used to classify the lung as normal or
malignant.
A dataset of 201 lung images is used in which 85% of the images are
used for training and 15% of the images are used for testing and classification.
Experimental results show that the proposed method has achieved an
accuracy of 90.85%.
i
LIST OF CONTENTS
CHAPTER TITLE PAGE NO

ABSTRACT i
LIST OF FIGURE v
LIST OF ABBREVIATIONS vi
1 INTRODUCTION 1
1.1 Prediction 1
1.2 Staging 2
1.3 Survey 3
2 LITERATURE SURVEY 4
2.1 Lung cancer detection &
Classification using deep learning 4
2.2 Detection and classification of lung
Abnormalities by use of convolution
Neural network (CNN) and regions with
CNN feature(R-CNN) 4
2.3 Detection and classification of
Pulmonary nodules using convolutional
Neural network 5
2.4 Multiple resolution residually connected
Feature streams ,Automatic lung tumor
Segmentation from CT image 6
2.5 Lung image pulse classification
With Automatic feature learning 7
ii
3 SYSTEM REQUIREMENTS 8
3.1 Software Used 8
3.2 Language 8
3.3 Software Details 8
3.4 Python Modeling and Simulation 12
3.5 Python Advantages 13
3.6 Python Application 15
4 SYSTEM ANALYSIS : EXISTING MODEL 18

4.1 Existing System 18
4.2 Limitations 18
4.3 Main Cause 20
5 SYSTEM ANALYSIS: PROPOSED MODEL 24

5.1 Proposed System 24
5.2 Block Diagram 24
5.3 Working 25
5.4 Advantages 26
6 MODULE DESCRIPTION 27
6.1 Pre-processing 27
6.2 Feature Selection 28
6.3 Feature Extraction 29
6.4 CNN Layers 31
6.5 Data Augmentation 37
6.6 Ada boost Algorithm 38
iii
7 RESULT AND ANALYSIS 41
7.1 Training Part Result 41
7.2 Classification Part Result 43
8 CONCLUSION 44
REFERENCE 45
iv
LIST OF FIGURES
FIG NO TITLES PAGE NO

3.3.1 Python Features 9
3.3.2 Python Standard type
Hierarchy 11
4.1 Existing model 23
5.2 Block diagram of
Proposed system 24
6.1 Data Processing 27
6.3 Feature Extraction 30
6.4 CNN Layers 31
6.4.1 Fully Connected Structure 37
6.5 Augmentation Network 38
6.6 Ada boost Algorithm 40
7.1 Training Part Result 41
7.2 Classification Part Result 43
v
LIST OF ABBREVIATION
ACRONYM ABBREVIATION
ADABOOST Adaptive Boosting
AI Artificial Intelligence
API Application Programming
Interface
CAD Computer Aided Diagnosis
CNN Convolution Neural Network
CT Computed Tomography
EDA Exploration Data Analysis
GLCM Gray-level Co-occurrence Matrix
HU House Field Unit
IDRI Image Database Resource Initiative
LIDC Lung Image Database Consortium
LUNA 16 Lung Nodule Analysis 2016
MRI Magnetic Resource Imaging
MRRN Multiple Resolution Residually
Connected Network
NSCLC Non Small Cell Lung Cancer
PyPI Python Package Index
ROI Region of Interest
vi
CHAPTER-1
INTRODUCTION
Lung cancer has become one of the most common causes of death in the
world . It is one of the most harmful malignant tumors to human health. Its
mortality rate ranks first among malignant tumor deaths and is the number one
killer of cancer deaths among men and women worldwide . There are about 1.8
million new cases of lung cancer per year (13% of all tumors), 1.6 million
deaths (19.4% of all tumors) in the world [4], and 5-year survival rate is only
18% . Lung cancer is a disease of abnormal cells multiplying and growing into a
tumor. The mortality rate of lung cancer is the highest among all other types
cancer. An estimated 85 percent of lung cancer cases in males and 75 percent in
females are caused by cigarette smoking. Lung cancer is one of the most
dreadful diseases in the developing countries and its mortality rate is 19.4% .
1.1. PREDICTION
Lung cancer is one of the most serious cancers in the world, with the
smallest survival rate after the diagnosis, with a gradual increase in the number
of deaths every year. Survival from lung cancer is directly related to its growth
at its detection time. But people do have a higher chance of survival if the
cancer can be detected in the early stages. Cancer cells can be carried away
from the lungs in blood, or lymph fluid that surrounds lung tissue. Lymph flows
through lymphatic vessels, which drain into lymph nodes located in the lungs
and in the centre chest. Lung cancer is one of the most killer diseases in the
developing countries and the detection of the cancer at the early stage is a
1
challenge. Analysis and cure of lung malignancy have been one of the greatest
difficulties faced by humans over the most recent couple of decades. Early
identification of tumor would facilitate in sparing a huge number of lives over
the globe consistently. This paper presents an approach which utilizes a
Convolutional Neural Network (CNN) to classify the tumors found in lung as
malignant or benign. The accuracy obtained by mea
+ns of CNN is 96%, which is more efficient when compared to accuracy

obtained by the traditional neural network systems.
1.2. STAGING
Lung cancer often spreads toward the center of the chest because the
natural flow of lymph out of the lungs is toward the center of the chest. Lung
cancer can be divided into two main groups, non-small cell lung cancer and
small cell lung cancer. These assigned of the lung cancer types are depends on
their cellular characteristicsStaging is based on tumor size and lymph node
location. Presently, CT are said to be more effective than plain chest x-ray in
detecting and diagnosing the lung cancer. Early detection of lung tumor is done
by using many imaging techniques such as Computed Tomography (CT),
Sputum Cytology, Chest X-ray and Magnetic Resonance Imaging (MRI).
Detection means classifying tumor two classes (i)non-cancerous tumor (benign)
and (ii)cancerous tumor (malignant). The chance of survival at the advanced
stage is less when compared to the treatment and lifestyle to survive cancer
therapy when diagnosed at the early stage of the cancer. Manual analysis and
diagnosis system can be greatly improved with the implementation of image
processing techniques. A number of researches on the image processing
techniques to detect the early stage cancer detection are available in the
literature. But the hit ratio of early stage detection of cancer is not greatly
2
improved. With the advancement in the machine learning techniques, the early
diagnosis of the cancer is attempted by lot of researchers. Neural network plays
a key role in the recognition of the cancer cells among the normal tissues, which
in turn provides an effective tool for building an assistive AI based cancer
detection. The cancer treatment will be effective only when the tumor cells are
accurately separated from the normal cells Classification of the tumor cells and
training of the neural network forms the basis for the machine learning based
cancer diagnosis .
1.3. SURVEY
Lung cancer has become one of the most significant diseases in human
history. The World Health Organization estimates the worldwide death toll from
lung cancer will be 10,000,000 by 2030. The 5-year survival rate for advanced
Non Small Cell Lung Cancer (NSCLC) remains disappointingly low. It has
been hypothesized that quantitative image feature analysis can improve
diagnostic/prognostic or predictive accuracy, and therefore will have an impact
on a significant number of patients . In the current study, standard-of-care
clinical computed tomography (CT) scans were used for image feature
extraction. In order to reduce variability for feature extraction, the first and
essential step is to accurately delineate the lung tumors. Accurate delineation of
lung tumors is also crucial for optimal radiation oncology. A common approach
to delineate tumor from CT scans involves radiologists or radiation oncologists
manually drawing the boundary of the tumor. In the majority of cases, manual
segmentation overestimates the lesion volume to ensure the entire lesion is
identified and the process is highly variable A stable accurate segmentation is
critical, as image features (such as texture and shape related features) are
sensitive to small tumor boundary changes.
3
CHAPTER -2
LITERATURE SURVEY
2.1. TITLE: Lung cancer detection and classification using deep

learning- 2019
AUTHORS: Ruchita Tekade ,K.Rajeshwari.
In recent years, so many Computer Aided Diagnosis (CAD) systems are

designed for diagnosis of several diseases. Lung cancer detection at early stage
has become very important and also very easy with image processing and deep
learning techniques. In this study lung patient Computer Tomography (CT) scan
images are used to detect and classify the lung nodules and to detect the
malignancy level of that nodules. The CT scan images are segmented using U-
Net architecture. This paper proposes 3D multipath VGG-like network, which is
evaluated on 3D cubes, extracted from Lung Image Database Consortium and
Image Database Resource Initiative (LIDC-IDRI), Lung Nodule Analysis 2016
(LUNA16) and Kaggle Data Science Bowl 2017 datasets. Prediction from U-
Net and 3D multipath VGG-like network are combined for final results. The
lung nodules are classified and malignancy level is detected using this
architecture with 95.60% of Accuracy and 0.387732 of log loss.
DRAWBACK
Due to the lack of strict clinical guidelines and the resemblance between
the different ILD findings determines the problem of radiological.
4
2.2. TITLE: Detection and classification of lung abnormalities by use of
convolutional neural network (CNN) and regions with CNN
features (R-CNN) - 2018
AUTHORS :Shoji Kido,Yasushi Hirano, Noriaki Hashimoto.
Image-based computer-aided diagnosis (CADx) algorithm by use

of convolutional neural network (CNN) does not necessarily require an image-
feature extractor. Therefore, image-based CADx is powerful compared with
feature-based CADx that requires the image-feature extractor for differential
diagnosis of lung abnormalities such as lung nodules and diffuse lung diseases.
We have also developed an image-based computer-aided detection (CADe)
algorithm by use of regions with CNN features (R-CNN) for detection of lung
abnormalities. We evaluated the performance of image-based CADx by use of
CNN and that of image-based CADe by use of R-CNN for various kinds of lung
abnormalities such as lung nodules and diffuse lung diseases.
DRAWBACK
Like human face or hand written characters, there is no obvious structure

on 32 × 32 patches of lung image.
2.3. TITLE: Detection and classification of Pulmonary Nodules using

Convolutional Neural Networks-2019
AUTHORS: Patrice Monkam, Shouliang Qi , He Ma, Weiming Gao,

Yudong Yao, Wei Qian
CT screening has been proven to be effective for diagnosing lung cancer

at its early manifestation in the form of pulmonary nodules, thus decreasing the
mortality. However, the exponential increase of image data makes their accurate
assessment a very challenging task given that the number of radiologists is
5
limited and they have been overworked. Recently, numerous methods,
especially ones based on deep learning with convolutional neural network
(CNN), have been developed to automatically detect and classify pulmonary
nodules in medical images. In this paper, we present a comprehensive analysis
of these methods and their performances. First, we briefly introduce the
fundamental knowledge of CNN as well as the reasons for their suitability to
medical images analysis.
2.4. TITLE: Multiple Resolution Residually connected feature

streams, Automatic lung tumor segmentation from CT
images-2018
AUTHORS: Jue Jiang , Yu-Chi Hu , Chia-Ju Liu , Darragh Halpenny ,

Matthew D. Hellmann, Joseph O.Deasy.
Volumetric lung tumor segmentation and accurate longitudinal tracking

of tumor volume changes from computed tomography images are essential
for monitoring tumor response to therapy. Hence, we developed two
multiple resolution residually connected network (MRRN) formulations
called incremental-MRRN and dense-MRRN. Our networks simultaneously
combine features across multiple image resolution and feature levels
through residual connections to detect and segment the lung tumors. We
evaluated our method on a total of 1210 non-small cell (NSCLC) lung
tumors and nodules from three data sets consisting of 377 tumors from the
open-source Cancer Imaging Archive (TCIA), 304 advanced stage NSCLC
treated with anti- PD-1 checkpoint immunotherapy from internal institution
MSKCC data set, and 529 lung nodules from the Lung Image Database
Consortium (LIDC).
6
2.5. TITLE: Lung Image Patch Classification with Automatic Feature
Learning-2015
AUTHORS : Qing Li, Weidong Cai, David Dagan Feng
Automatic feature learning from image data has thus emerged as a

different trend recently, to capture the intrinsic image features without manual
feature design. In this paper, we propose to create multi-scale feature extractors
based on an unsupervised learning algorithm; and obtain the image feature
vectors by convolving the feature extractors with the image patches. The auto-
generated image features are data-adaptive and highly descriptive.
7
CHAPTER -3
SYSTEM REQUIREMENTS
3.1. SOFTWARE USED
SOFTWARE : PYTHON
3.2. LANGUAGE
PROGRAMMING LANGUAGE USED : PYTHON
3.3. SOFTWARE DETAILS
3.3.1. PYTHON FEATURES
Python is a multi-paradigm programming language. Object-oriented

programming and structured programming are fully supported, and many of its
features support functional programming and aspect-oriented programming
(including by meta programming and meta objects (magic methods)). Many
other paradigms are supported via extensions, including design by contract and
logic programming. Python uses dynamic typing and a combination of reference
counting and a cycle-detecting garbage collector for memory management. It
also features dynamic name resolution (late binding), which binds method and
variable names during program execution.
Python's design offers some support for functional programming in the

Lisp tradition. It has filter, map, and reduce functions; list
comprehensions, dictionaries, sets, and generator expressions. The standard
8
library has two modules (itertools and functools) that implement functional
tools borrowed from Haskell and Standard ML.
Fig 3.3.1 Python Features
3.3.2. PYTHON PHILOSOPHY
The language's core philosophy is summarized in the document The Zen

of Python (PEP 20), which includes aphorisms such as:
• Beautiful is better than ugly.

• Explicit is better than implicit.
• Simple is better than complex.
• Complex is better than complicated.
• Readability counts.
Rather than having all of its functionality built into its core, Python was
designed to be highly extensible. This compact modularity has made it
particularly popular as a means of adding programmable interfaces to existing
9
applications. Van Rossum's vision of a small core language with a large
standard library and easily extensible interpreter stemmed from his frustrations
with ABC, which espoused the opposite approach. Python strives for a simpler,
less-cluttered syntax and grammar while giving developers a choice in their
coding methodology. In contrast to Perl's "there is more than one way to do it"
motto, Python embraces a "there should be one—and preferably only one—
obvious way to do it" design philosophy. Alex Martelli, a Fellow at the Python
Software Foundation and Python book author, writes that "To describe
something as 'clever' is not considered a compliment in the Python culture.
Python's developers strive to avoid premature optimization, and reject

patches to non-critical parts of the CPython reference implementation that
would offer marginal increases in speed at the cost of clarity. When speed is
important, a Python programmer can move time-critical functions to extension
modules written in languages such as C, or use PyPy, a just-in-time compiler.
Cython is also available, which translates a Python script into C and makes
direct C-level API calls into the Python interpreter. An important goal of
Python's developers is keeping it fun to use. This is reflected in the language's
name—a tribute to the British comedy group Monty Python and in occasionally
playful approaches to tutorials and reference materials, such as examples that
refer to spam and eggs (from a famous Monty Python sketch) instead of the
standard foo and bar.
A common neologism in the Python community is pythonic, which can

have a wide range of meanings related to program style. To say that code is
pythonic is to say that it uses Python idioms well, that it is natural or shows
fluency in the language, that it conforms with Python's minimalist philosophy
and emphasis on readability. In contrast, code that is difficult to understand or
reads like a rough transcription from another programming language is called
10
unpythonic. Users and admirers of Python, especially those considered
knowledgeable or experienced, are often referred to as Pythonistas.
Fig 3.3.2 Python standard Type Hierarchy
3.3.3. INDENTATION SYNTAX
Python uses whitespace indentation, rather than curly brackets or

keywords, to delimit blocks. An increase in indentation comes after certain
statements; a decrease in indentation signifies the end of the current block.
Thus, the program's visual structure accurately represents the program's
semantic structure.[6] This feature is sometimes termed the off-side rule, which
some other languages share, but in most languages indentation doesn't have any
semantic meaning.
11
3.4. PYTHON MODELING AND SIMULATION
Modeling and Simulation in Python is an introduction to physical

modeling using a computational approach. It is organized in three parts:
• The first part presents discrete models, including a bike share system and
world population growth.
• The second part introduces first-order systems, including models of
infectious disease, thermal systems, and pharma co kinetics.
• The third part is about second-order systems, including mechanical
systems like projectiles, celestial mechanics, and rotating rigid bodies.
Taking a computational approach makes it possible to work with more

realistic models than what you typically see in a first-year physics class, with
the option to include features like friction and drag. Python is an ideal
programming language for this material. It is a good first language for people
who have not programmed before, and it provides high-level data structures that
are well-suited to express solutions to the problems we are interested in.
Modeling and Simulation in Python is a Free Book. It is available under

the a Creative Commons license, which means that you are free to copy,
distribute, and modify it, as long as you attribute the source and don’t use it for
commercial purposes.
3.4.1. PYTHON SIMPY SIMULATOR
SimPy is a process-based discrete-event simulation framework based on

standard Python. Processes in SimPy are defined by Python generator functions
and may, for example, be used to model active components like customers,
vehicles or agents. SimPy also provides various types of shared resources to
model limited capacity congestion points (like servers, checkout counters and
12
tunnels). Simulations can be performed “as fast as possible”, in real time (wall
clock time) or by manually stepping through the events. Though it is
theoretically possible to do continuous simulations with SimPy, it has no
features that help you with that. On the other hand, SimPy is overkill for
simulations with a fixed step size where your processes don’t interact with each
other or with shared resources.
3.5. PYTHON ADVANTAGES
The diverse application of the Python language is a result of the

combination of features which give this language an edge over others. Some of
the benefits of programming in Python include:
3.5.1. Presence of Third Party Modules
The Python Package Index (PyPI) contains numerous third-party modules

that make Python capable of interacting with most of the other languages and
platforms.
3.5.2. Extensive Support Libraries
Python provides a large standard library which includes areas like internet
protocols, string operations, web services tools and operating system interfaces.
Many high use programming tasks have already been scripted into the standard
library which reduces length of code to be written significantly.
3.5.3 Open Source and Community Development
Python language is developed under an OSI-approved open source

license, which makes it free to use and distribute, including for commercial
13
purposes. Further, its development is driven by the community which
collaborates for its code through hosting conferences and mailing lists, and
provides for its numerous modules.
3.5.4. Learning Ease and Support Available
Python offers excellent readability and uncluttered simple-to-learn syntax

which helps beginners to utilize this programming language. The code style
guidelines, PEP 8, provide a set of rules to facilitate the formatting of code.
Additionally, the wide base of users and active developers has resulted in a rich
internet resource bank to encourage development and the continued adoption of
the language.
3.5.5. User-friendly Data Structures:
Python has built-in list and dictionary data structures which can be used
to construct fast runtime data structures. Further, Python also provides the
option of dynamic high-level data typing which reduces the length of support
code that is needed.
3.5.6. Productivity and Speed
Python has clean object-oriented design, provides enhanced process

control capabilities, and possesses strong integration and text processing
capabilities and its own unit testing framework, all of which contribute to the
increase in its speed and productivity. Python is considered a viable option for
building complex multi-protocol network applications.
14
3.7. PYTHON APPLICATIONS
Python supports cross-platform operating systems which makes

building applications with it all the more convenient. Some of the globally
known applications such as YouTube, Bit Torrent, Drop Box, etc. use Python to
achieve their functionality.
3.7.1. Web Development
Python can be used to make web-applications at a rapid rate. Why is that?

It is because of the frameworks Python uses to create these applications. There
is common-backend logic that goes into making these frameworks and a number
of libraries that can help integrate protocols such as HTTPS, FTP, SSL etc. and
even help in the processing of JSON, XML, E-Mail and so much more. Some of
the most well-known frameworks are Django, Flask, Pyramid. The security,
scalability, convenience that they provide is unparalleled compared to starting
the development of a website from scratch.
3.7.2. Game Development
Python is also used in the development of interactive games. There are

libraries such as PySoy which is a 3D game engine supporting Python 3,
PyGame which provides functionality and a library for game development.
Games such as Civilization-IV, Disney’s Toontown Online, Vega Strike etc.
have been built using Python.
3.7.3. Python Scripting Certification Training
• Self-Paced Learning
• Real-life Case Studies
15
• Assignments
• Lifetime Access
3.7.4. Machine Learning and Artificial Intelligence

Machine Learning and Artificial Intelligence are the talks of the town as
they yield the most promising careers for the future. We make the computer
learn based on past experiences through the data stored or better yet, create
algorithms which makes the computer learn by itself. The programming
language that mostly everyone chooses? It’s Python. Why? Support for these
domains with the libraries that exist already such as Pandas, Scikit-Learn,
NumPy and so many more. Learn the algorithm, use the library and you have
your solution to the problem. It is that simple. But if you want to go the
hardcore way, you can design your own code which yields a better solution,
which still is much easier compared to other languages.
3.7.5. Data Science and Data Visualization
Data is money if you know how to extract relevant information which

can help you take calculated risks and increase profits. You study the data you
have, perform operations and extract the information required. Libraries such as
Pandas, NumPy help you in extracting information. You can even visualize the
data libraries such as Matplotlib, Seaborn, which are helpful in plotting graphs
and much more. This is what Python offers you to become a Data Scientist.
3.7.6. Desktop GUI
Python can be used to program desktop applications. It provides the

Tkinter library that can be used to develop user interfaces. There are some other
useful toolkits such as the wxWidgets, Kivy, PYQT that can be used to create
applications on several platforms. You can start out with creating simple
16
applications such as Calculators, To-Do apps and go ahead and create much
more complicated applications.
3.7.7. Web Scraping Applications

Python can be used to pull a large amount of data from websites which
can then be helpful in various real-world processes such as price comparison,
job listings, research and development and much more.
3.7.8. Programming & Frameworks Training
Python has a library called Beautiful Soup which can be used to pull such
data and be used accordingly. Here’s a full-fledged guide to learn Web scraping
with Python.
3.7.9. Business Applications
Business Applications are different than our normal applications covering

domains such as e-commerce, ERP and many more.
17
CHAPTER-4
SYSTEM ANALYSIS : EXISTING MODEL
4.1. EXISTING SYSTEM
Roy, Sirohi, and Patle developed a system to detect lung cancer nodule
using fuzzy interference system and active contour model. This system uses gray
transformation for image contrast enhancement. Image binarization is performed
before segmentation and resulted image is segmented using active contour model.
Cancer classification is performed using fuzzy inference method. Features like
area, mean, entropy, correlation, major axis length, minor axis length are
extracted to train the classifier. Overall, accuracy of the system is 94.12%.
Counting its limitation it does not classify the cancer as benign or malignant
which is future scope of this proposed model. Ignatious and Joseph [8] developed
a system using watershed segmentation. In pre processing it uses Gabor filter to
enhance the image quality. It compares the accuracy with neural fuzzy model and
region growing method. Accuracy of the proposed is 90.1% which is
comparatively higher than the model with segmentation using neural fuzzy model
and region growing method. The advantage of this model is that it uses marker
controlled watershed segmentation which solves over segmentation problem. As
a limitation it does not classify the cancer as benign or malignant and accuracy is
high but still not satisfactory. Some changes and contribution in this model has
probability of increasing the accuracy to satisfactory level .
4.2 LIMITATIONS
Gonzalez and Ponomaryvo proposed a system that classifies lung cancer

as benign or malignant. The system uses the priori information and
HousefieldUnit(HU) to calculate the Region of Interest(ROI). Shape features like
18
area, eccentricity, circularity, fractal dimension and textural features like mean,
variance, energy, entropy, skewness, contrast, and smoothness are extracted to
train and classify the support vector machine to identify whether the nodule is
benign or malignant. The advantage of this model is that it classifies cancer as
benign or malignant, however the limitation of it is that prior information is
required about region of interest.
Model’s classification of benign or malignant using support vector

machine can be useful in our new model. Analyzing the literature reviews, on the
basis of accuracy and advantages of the steps used, the system proposed by
Ignatious and Joseph is current best solution. In image pre processing it uses
Gabor filter to enhance the image and uses marker controlled watershed method
for segmentation and detects the cancer nodule. This model also extracts the
features like area, perimeter, and eccentricity only of the cancer nodules. It shows
the comparison with other previously proposed models and highlights its
accuracy 90.1% which is higher than of those. Even the system is current best
solution (fig. 3.4), it has some limitations. Only few features has been extracted
for cancer nodules No preprocessing like noise removal, image smoothing
which can probably assists in increasing the detection of nodules accurately has
been implemented .
No classification as benign or malignant of extracted cancer has been

performed Fig. 1. Current Best model and its limitations (Ignatious and
Joseph,2015) 3. Proposed Model Changes on current best solution have been
made and new model has been proposed. Instead of Gabor Filter, Median filter
and Gaussian filter have been implemented in preprocessing stage. After pre
processing the processed image is segmented using watershed segmentation. This
gives the image with cancer nodules marked. In addition to features like area,
perimeter and eccentricity, features like Centroid, Diameter and pixel Mean
Intensity have been extracted in feature extraction stage for the detected cancer
19
nodules. The best model ends after the detection of cancer nodule, it’s feature
extraction and calculation of accuracy. But, its classification as benign or
malignant has not been implemented. Therefore, additional stage of classification
of cancer nodule has been performed using Support Vector Machine. Extracted
features are used as training features and trained model is generated. Then,
unknown detected cancer nodule is classified using that trained prediction model.
3.2 Image Preprocessing Firstly, in image pre-processing median filter is used on
grayscale image of CT scan images. Some noises are embedded on CT Images at
the time of image acquisition process which aids in false detection of nodules.
4.3 MAIN CAUSE
Noise Suren Makaju et al. / Procedia Computer Science 125 (2018) 107–
11410921. Introduction Lung cancer is one of the causes of cancer deaths. It is
difficult to detect because it arises and shows symptoms in final stage. However,
mortality rate and probability can be reduced by early detection and treatment of
the disease. Best imaging technique CT imaging are reliable for lung cancer
diagnosis because it can disclose every suspected and unsuspected lung cancer
nodules . However, variance of intensity in CT scan images and anatomical
structure misjudgment by doctors and radiologists might cause difficulty in
marking the cancerous cell . Recently, to assist radiologists and doctors detect the
cancer accurately computer Aided Diagnosis has become supplement and
promising tool [3].
There has been many system developed and research going on detection of
lung cancer. However, some systems do not have satisfactory accuracy of
detection and some systems still has to be improved to achieve highest accuracy
tending to 100%. Image processing techniques and machine learning techniques
has been implemented to detect and classify the lung cancer. We studied recent
systems developed for cancer detection based on CT scan images of lungs to
choose the recent best systems and analysis was conducted on them and new
20
model was proposed. 2. Literature Review Several researchers has proposed and
implemented detection of lung cancer using different approaches of image
processing and machine learning. Aggarwal, Furquan and Kalra proposed a
model that provides classification between nodules and normal lung anatomy
structure. The method extracts geometrical, statistical and gray level
characteristics. LDA is used as classifier and optimal thresholding for
segmentation. The system has 84% accuracy, 97.14% sensitivity and 53.33%
specificity.
Although the system detects the cancer nodule, its accuracy is still
unacceptable. No any machine learning techniques has been used to classify and
simple segmentation techniques is used. Therefore, combination of any of its
steps in our new model does not provide probability of improvement. Jin, Zhang
and Jin used convolution neural network as classifier in his CAD system to
detect the lung cancer.
The system has 84.6% of accuracy, 82.5% of sensitivity and 86.7% of

specificity. The advantage of this model is that it uses circular filter in Region of
interest (ROI) extraction phase which reduces the cost of training and recognition
steps. Although, implementation cost is reduced, it has still unsatisfactory
accuracy. Sangamithraa and Govindaraju [6] uses K mean unsupervised learning
algorithm for clustering or segmentation. It groups the pixel dataset according to
certain characteristics. For classification this model implements back propagation
network. Features like entropy, correlation, homogeneity, PSNR, SSIM are
extracted using gray-level co-occurrence matrix (GLCM) method.
The system has accuracy of about 90.7%. Image pre processing median
filter is used for noise removal which can be useful for our new model to remove
the noise and improve the accuracy. Roy, Sirohi, and Patle developed a system
to detect lung cancer nodule using fuzzy interference system and active contour
model. This system uses gray transformation for image contrast enhancement.
21
Image binarization is performed before segmentation and resulted image is
segmented using active contour model. Cancer classification is performed using
fuzzy inference method. Features like area, mean, entropy, correlation, major axis
length, minor axis length are extracted to train the classifier.
Overall, accuracy of the system is 94.12%. Counting its limitation it does

not classify the cancer as benign or malignant which is future scope of this
proposed model. Ignatious and Joseph developed a system using watershed
segmentation. In pre processing it uses Gabor filter to enhance the image quality.
It compares the accuracy with neural fuzzy model and region growing method.
Accuracy of the proposed is 90.1% which is comparatively higher than the model
with segmentation using neural fuzzy model and region growing method. The
advantage of this model is that it uses marker controlled watershed segmentation
which solves over segmentation problem.
As a limitation it does not classify the cancer as benign or malignant and

accuracy is high but still not satisfactory. Some changes and contribution in this
model has probability of increasing the accuracy to satisfactory level. Gonzalez
and Ponomaryvo [9] proposed a system that classifies lung cancer as benign or
malignant. The system uses the priori information and HousefieldUnit(HU) to
calculate the Region of Interest(ROI). Shape features like area, eccentricity,
circularity, fractal dimension and textural features like mean, variance, energy,
entropy, skewness, contrast, and smoothness are extracted to train and classify
the support vector machine to identify whether the nodule is benign or malignant.
The advantage of this model is that it classifies cancer as benign or malignant,
3however the limitation of it is that prior information is required about region of
interest. Model’s classification of benign or malignant using support vector
machine can be useful in our new model.
Analyzing the literature reviews, on the basis of accuracy and advantages of

the steps used, the system proposed by Ignatious and Joseph is current best
22
solution. In image pre processing it uses Gabor filter to enhance the image
and uses marker controlled watershed method for segmentation and detects
the cancer nodule.
Fig 4.1 Existing Model
23
CHAPTER-5
PROPOSED MODEL
5.1 PROPOSED SYSTEM
To address this problem, bionic convolutional neural networks

are proposed to reduced the number of parameters and adapt the network
architecture specifically to vision tasks. Convolutional neural networks are
usually composed by a set of layers that can be grouped by their functionalities.
5.2 BLOCK DIAGRAM
Training Pre- Feature Feature

images processing selection Extraction
CNN
Testing Pre- Feature Feature

images processing selection Extraction
Normal Abnormal
image image
Fig 5.2 Block Diagram of Proposed System
24
5.3 WORKING
The project working is divided into two parts, ie training and testing
parts. A sample of 100 images are used where 60 images are used for training
and the remaining 40 images are used for testing.
In training part, images are first pre-processed, where resizing and

blur removal of images are done by using histogram equalization due to which
non-linear images are stretched and pixel values are redistributed. The small
image received is made visible for human eyes using data augmentation which
prevents over fitting and biasing. Feature selection is done to select images
based on colour, size that are given where the image is been converted from
spatial domain to frequency domain. The LH and HL frequencies of the images
are opted from feature extraction.
In testing part, images are first pre-processed, where resizing and

blur removal of images are done by using histogram equalization due to which
non-linear images are stretched and pixel values are redistributed. The small
image received is made visible for human eyes using data augmentation which
prevents over fitting and biasing. Feature selection is done to select images
based on colour, size that are given where the image is been converted from
spatial domain to frequency domain. The LH and HL frequencies of the images
are opted from feature extraction.
The results obtained from training and testing part are fed into in
CNN layers where the images are classified and the output is obtained. The
algorithm used for classification is ADABOOST (Adaptive Boosting), where
accuracy calculation is of the images is done based on the sample weights of the
images.
25
5.4 ADVANTAGES
➢ The main advantage of CNN compared to its predecessors is that it

automatically detects the important features without any human
supervision.
➢ It is useful to classify the Lung Cancer images for accurate detection.
➢ Lung Cancer will be detected in an early stages.
26
CHAPTER 6
MODULE DESCRIPTION
6.1 PRE-PROCESSING
Pre-processing refers to the transformations applied to our data before
feeding it to the algorithm. Data Preprocessing is a technique that is used to
convert the raw data into a clean data set. In other words, whenever the data is
gathered from different sources it is collected in raw format which is not
feasible for the analysis.
Fig 6.1 Data Processing
6.1.1 Need for Data Processing
For achieving better results from the applied model in Machine Learning
projects the format of the data has to be in a proper manner. Some
specified Machine Learning model needs information in a specified
format, for example, Random Forest algorithm does not support null
values, therefore to execute random forest algorithm null values have to be
managed from the original raw data set . Another aspect is that data set
27
should be formatted in such a way that more than one Machine Learning
and Deep Learning algorithms are executed in one data set, and best out of
them is chosen.
6.2 FEATURE SELECTION
Dimensionality reduction is the process of reducing the number of random

variables under consideration, by obtaining a set of principal variables. It can be
divided into feature selection and feature extraction.
Dimensionality Reduction is an important factor in predictive modeling.
Various proposed methods have introduced different approaches to do so by
either graphically or by various other methods like filtering, wrapping or
embedding. However, most of these approaches are based on some threshold
values and benchmark algorithms that determine the optimality of the features
in the dataset.
One motivation for dimensionality reduction is that higher dimensional data

sets increase the time complexity and also the space required will be more.
Also, all the features in the dataset might not be useful. Some may contribute no
information at all, while some may contribute similar information as the other
features. Selecting the optimal set of features will help us hence reduce the
space and time complexity as well as increase the accuracy or purity of
classification (or regression) and clustering (or association) for supervised and
unsupervised learning respectively.
Feature selection has four different approaches such as filter approach, wrapper
approach, embedded approach, and hybrid approach.
1.Wrapper approach : This approach has high computational complexity. It

uses a learning algorithm to evaluate the accuracy produced by the use of the
28
selected features in classification. Wrapper methods can give high classification
accuracy for particular classifiers.
2.Filter approach : A subset of features is selected by this approach without
using any learning algorithm. Higher-dimensional datasets use this method and
it is relatively faster than the wrapper-based approaches.
3.Embedded approach : The applied learning algorithms determine the
specificity of this approach and it selects the features during the process of
training the data set.
4.Hybrid approach : Both filter and wrapper-based methods are used in hybrid
approach. This approach first selects the possible optimal feature set which is
further tested by the wrapper approach. It hence uses the advantages of both
filter and wrapper-based approach.
6.3 FEATURE EXTRACTION
If the number of features becomes similar (or even bigger!) than the number
of observations stored in a dataset then this can most likely lead to a Machine
Learning model suffering from over fitting. In order to avoid this type of
problem, it is necessary to apply either regularization or dimensionality
reduction techniques (Feature Extraction). In Machine Learning, the
dimensionality reduction of a dataset is equal to the number of variables used to
represent it.
Using Regularization could certainly help reduce the risk of over fitting, but
using instead Feature Extraction techniques can also lead to other types of
advantages such as:
• Accuracy improvements.
• Over fitting risk reduction.
29
• Speed up in training.
• Improved Data Visualization.
• Increase in explain ability of our model.
Feature Extraction aims to reduce the number of features in a dataset by

creating new features from the existing ones (and then discarding the original
features). These new reduced set of features should then be able to summarize
most of the information contained in the original set of features. In this way, a
summarised version of the original features can be created from a combination
of the original set .Another commonly used technique to reduce the number of
feature in a dataset is Feature Selection. The difference between Feature
Selection and Feature Extraction is that feature selection aims instead to rank the
importance of the existing features in the dataset and discard less important ones
(no new features are created).
Fig 6.3 Feature Extraction

30
6.4 CNN LAYERS
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning

algorithm which can take in an input image, assign importance (learnable
weights and biases) to various aspects/objects in the image and be able to
differentiate one from the other. The pre-processing required in a ConvNet is
much lower as compared to other classification algorithms. While in primitive
methods filters are hand-engineered, with enough training, ConvNets have the
ability to learn these filters/characteristics.
The architecture of a ConvNet is analogous to that of the connectivity

pattern of Neurons in the Human Brain and was inspired by the organization of
the Visual Cortex. Individual neurons respond to stimuli only in a restricted
region of the visual field known as the Receptive Field. A collection of such
fields overlap to cover the entire visual area.
Fig 6.4 CNN Layers
31
Why ConvNets over Feed-Forward Neural Nets?
Flattening of a 3x3 image matrix into a 9x1 vector
An image is nothing but a matrix of pixel values, right? So why not just
flatten the image (e.g. 3x3 image matrix into a 9x1 vector) and feed it to a Multi-
Level Perceptron for classification purposes? Uh.. not really.
In cases of extremely basic binary images, the method might show an

average precision score while performing prediction of classes but would have
little to no accuracy when it comes to complex images having pixel
dependencies throughout.
A ConvNet is able to successfully capture the Spatial and Temporal

dependencies in an image through the application of relevant filters. The
architecture performs a better fitting to the image dataset due to the reduction in
the number of parameters involved and reusability of weights. In other words,
the network can be trained to understand the sophistication of the image better.
32
There are four layered concepts we should understand in Convolutional Neural
Networks:
1. Convolution,
2. ReLu
3. Pooling and
4. Full Connectedness (Fully Connected Layer).
6.4.1 Convolutional Layer
The convolutional layer is the core building block of a CNN. The layer's
parameters consist of a set of learnable filters (or kernels), which have a small
receptive field, but extend through the full depth of the input volume. During
the forward pass, each filter is convolved across the width and height of the
input volume, computing the dot product between the entries of the filter and
the input and producing a 2-dimensional activation map of that filter. As a
result, the network learns filters that activate when it detects some specific type
of feature at some spatial position in the input.
Stacking the activation maps for all filters along the depth dimension
forms the full output volume of the convolution layer. Every entry in the output
volume can thus also be interpreted as an output of a neuron that looks at a
small region in the input and shares parameters with neurons in the same
activation map.
When dealing with high-dimensional inputs such as images, it is

impractical to connect neurons to all neurons in the previous volume because
such a network architecture does not take the spatial structure of the data into
account. Convolutional networks exploit spatially local correlation by enforcing
33
a sparse local connectivity pattern between neurons of adjacent layers: each
neuron is connected to only a small region of the input volume.
The extent of this connectivity is a hyperparameter called the receptive field of

the neuron. The connections are local in space (along width and height), but
always extend along the entire depth of the input volume. Such an architecture
ensures that the learnt filters produce the strongest response to a spatially local
input pattern.
6.4.2 ReLu Layer
In a neural network, the activation function is responsible for transforming

the summed weighted input from the node into the activation of the node or
output for that input.
The rectified linear activation function is a piecewise linear function that

will output the input directly if is positive, otherwise, it will output zero. It has
become the default activation function for many types of neural networks
because a model that uses it is easier to train and often achieves better
performance.
In order to use stochastic gradient descent with back propagation of

errors to train deep neural networks, an activation function is needed that looks
and acts like a linear function, but is, in fact, a nonlinear function allowing
complex relationships in the data to be learned.
The function must also provide more sensitivity to the activation sum input and
avoid easy saturation.
A node or unit that implements this activation function is referred to as

a rectified linear activation unit, or ReLU for short. Often, networks that use
the rectifier function for the hidden layers are referred to as rectified networks.
34
6.4.3 Pooling Layer
Pooling layers provide an approach to down sampling feature maps by

summarizing the presence of features in patches of the feature map. Two
common pooling methods are average pooling and max pooling that summarize
the average presence of a feature and the most activated presence of a feature
respectively.
A pooling layer is a new layer added after the convolutional layer.

Specifically, after a nonlinearity (e.g. ReLU) has been applied to the feature
maps output by a convolutional layer. The addition of a pooling layer after the
convolutional layer is a common pattern used for ordering layers within a
convolutional neural network that may be repeated one or more times in a given
model.
The pooling layer operates upon each feature map separately to create a
new set of the same number of pooled feature maps. Pooling involves selecting
a pooling operation, much like a filter to be applied to feature maps. The size of
the pooling operation or filter is smaller than the size of the feature map;
specifically, it is almost always 2×2 pixels applied with a stride of 2 pixels .This
means that the pooling layer will always reduce the size of each feature map by
a factor of 2, e.g. each dimension is halved, reducing the number of pixels or
values in each feature map to one quarter the size. For example, a pooling layer
applied to a feature map of 6×6 (36 pixels) will result in an output pooled
feature map of 3×3 (9 pixels).
Two common functions used in the pooling operation are:
• Average Pooling: Calculate the average value for each patch on the feature
map.
35
• Maximum Pooling (or Max Pooling): Calculate the maximum value for each
patch of the feature map.
The result of using a pooling layer and creating down sampled or pooled feature
maps is a summarized version of the features detected in the input. They are
useful as small changes in the location of the feature in the input detected by the
convolutional layer will result in a pooled feature map with the feature in the
same location. This capability added by pooling is called the model’s invariance
to local translation.
6.4.4 Fully Connected Layer
Fully connected layers are an essential component of Convolutional Neural

Networks (CNNs), which have been proven very successful in recognizing and
classifying images for computer vision. The CNN process begins with
convolution and pooling, breaking down the image into features, and analyzing
them independently. The result of this process feeds into a fully connected
neural network structure that drives the final classification decision.
The objective of a fully connected layer is to take the results of the

convolution/pooling process and use them to classify the image into a label
The output of convolution/pooling is flattened into a single vector of values,

each representing a probability that a certain feature belongs to a label. For
example, if the image is of a cat, features representing things like whiskers or
fur should have high probabilities for the label “cat”.
The fully connected part of the CNN network goes through its own back
propagation process to determine the most accurate weights. Each neuron
receives weights that prioritize the most appropriate label. Finally, the neurons
“vote” on each of the labels, and the winner of that vote is the classification
decision.
36
Fig 6.4.1 Fully Connected Structure
6.5 DATA AUGMENTATION

Data augmentation is the process of increasing the amount and diversity of
data. We do not collect new data, rather we transform the already present data.
1. Need for data augmentation

Data augmentation is an integral process in deep learning, as in deep
learning we need large amounts of data and in some cases it is not feasible to
collect thousands or millions of images, so data augmentation comes to the
rescue.
It helps us to increase the size of the dataset and introduce variability in the
dataset.
37
2. Operations in data augmentation
The most commonly used operations are-
1. Rotation
2. Shearing
3. Zooming
4. Cropping
5. Flipping
6. Changing the brightness level
Fig 6.5 Augmentation Network
6.6 ADABOOST ALGORITHM

Ada Boost is short for Adaptive Boosting. Basically, Ada Boosting was
the first really successful boosting algorithm developed for binary classification.
Also, it is the best starting point for understanding boosting. Moreover, modern
boosting methods build on Ada Boost, most notably stochastic gradient
boosting machines.
Generally, Ada Boost is used with short decision trees. Further, the first
tree is created, the performance of the tree on each training instance is used.
Also, we use it to weight how much attention the next tree. Thus, it is
38
created should pay attention to each training instance. Hence, training data that
is hard to predict is given more weight. Although, whereas easy to predict
instances are given less weight.
• Ada Boosting is best used to boost the performance of decision trees and
this is based on binary classification problems.
• Ada Boost was originally called AdaBoost.M1 by the author. More recently
it may be referred to as discrete Ada Boost. As because it is used for
classification rather than regression.
• Ada Boost can be used to boost the performance of any machine learning
algorithm. It is best used with weak learners.
Each instance in the training dataset is weighted. The initial weight is set to:
weight(xi) = 1/n
Where xi is the training instance and n is the number of training instances.
A weak classifier is prepared on the training data using the weighted samples.
Only binary classification problems are supported. So each decision stump
makes one decision on one input variable. And outputs a +1.0 or -1.0 value for
the first or second class value.
The misclassification rate is calculated for the trained model. Traditionally,

this is calculated as:
error = (correct – N) / N
Where error is the misclassification rate. While correct is the number of

training instance predicted by the model. And N is the total number of training
instances.
• Basically, weak models are added sequentially, trained using the weighted
training data.
• Generally, the process continues until a pre-set number of weak learners
have been created.
39
• Once completed, you are left with a pool of weak learners each with a stage
value.
Predictions are made by calculating the weighted average of the weak
classifiers.
For a new input instance, each weak learner calculates a predicted value as
either +1.0 or -1.0. The predicted values are weighted by each weak learners
stage value. The prediction for the ensemble model is taken as a sum of the
weighted predictions. If the sum is positive, then the first class is predicted, if
negative the second class is predicted.
Fig 6.6 Ada Boost Algorithm
40
CHAPTER 7
RESULT AND ANALYSIS
7.1 TRAINING PART RESULT
41
Fig 7.1 Training Part Result
The neural network based on convolutional segmentation has been implemented
in PYTHON and the system is trained with sample data sets for the model to
understand and familiarize the lung cancer. A sample image has been fed as an
input to the trained model and the model at this stage is able to tell the presence
of cancer and locate the cancer spot in the sample image of a lung cancer. The
process involves the feeding the input image, preprocessing, feature extraction,
identifying the cancer spot and indicate the results to the user. In case of the
malignancy is present, a message indicating the presence of will be displayed on
the screen along with the given input image
42
7.2 CLASSIFICATION PART RESULT
Fig 7.2 Classification Part Result
Lung cancer detection using the convolutional neural network which model by
the end to end learning, Some of the parameter used for training the model of
the neural network CNN has two layers such as 2 convolution layers and 2
subsampling layer which is used to increase the accuracy of detection. The
confusion matrix parameters derived from CNN output
The confusion matrix shows the true positive, true negative, false positive and
false negative. From the analysis true positive gives the correctly classified the
lung cancer images and false positive gives the misclassification of images
which means that the lung cancer is wrongly predicted as non-cancerous image.
43
CHAPTER 8
CONCLUSION
The main advantage of deep learning over other machine learning

algorithms is its capacity to execute feature engineering on it own. A deep
learning algorithm will scan the data to search for features that correlate and
combine them to enable faster learning without being explicitly told to do so.
Convolutional Neural Network takes advantage of local spatial

coherence in the input (often images), which allow them to have fewer weights
as some parameters are shared. This process, taking the form of convolutions,
makes them especially well suited to extract relevant information at a low
computational cost.
The training and testing of images are done where images are pre-
processed and feature selection and feature extraction of images are done. Once
training and testing part is done successfully, the CNN algorithm classifies the
input lung image either as normal or abnormal and the output will be displayed.
Hence, a Deep CNN network is used for the classification of lung images
for the detection of cancer.
44
REFERENCES
[1] R. L. Siegel, K. D. Miller, and S. A. Fedewa, ‘‘Colorectal cancer

statistics,’’ Cancer J. Clin., vol. 68, no. 1, pp. 7–30, 2018
[2] R. L. Siegel, K. D. Miller, and A. Jemal, ‘‘Cancer statistics, 2017,’’ Cancer

J. Clin., vol. 67, no. 1, pp. 7–30, Jan. 2017.
[3] D. Bhatnagar, A. K. Tiwari, V. Vijayarajan, and A. Krishnamoorthy,

‘‘Classification of normal and abnormal images of lung cancer,’’ IOP Conf.
Ser., Mater. Sci. Eng., vol. 263, Nov. 2017, Art. no. 042100.
[4] S. Rattan, S. Kaur, N. Kansal, and J. Kaur, ‘‘An optimized lung cancer
classification system for computed tomography images,’’ in Proc. 4th Int. Conf.
Image Inf. Process. (ICIIP), Dec. 2017, pp. 1–6.
[5] F. C. Detterbeck, D. J. Boffa, and A. W. Kim, ‘‘The eighth edition lung

cancer stage classifification,’’ Chest, vol. 151, no. 1, pp. 193–203, 2017
[6] Wavelet Recurrent Neural Network for Lung Cancer Classification”:3rd

ICSTcomputer,2017.
[7] A.Kavitha, Anusiyasaral and P.Senthil,” Design Model of Retiming

Multiplier For FIR Filter &its Verification”, International Journal of Pure and
Applied Mathematics, Vol116 No12, 2017, pp. 239-247
[8] T. Yoshiya, T. Mimae, and Y. Tsutani, ‘‘Prognostic role of subtype

classification in small-sized pathologic N0 invasive lung adenocarcinoma,’’
Ann.Thoracic Surg., vol. 102, no. 5, pp. 1668–1673, 2016.
45
[9] K. He, X. Zhang, and S. Ren, ‘‘Deep residual learning for image
recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2016,
pp. 770–778.
[10] S Sasikala, M Ezhilarasi, Combination of Mammographic Texture Feature

Descriptors for Improved Breast Cancer Diagnosis. Asian Journal of
Information Technology, 2016
[11] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,

V.Vanhoucke, and A. Rabinovich, ‘‘Going deeper with convolutions,’’ in
Proc.IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 1–9.
[12] A. Wang, H. Wang, Y. Liu, M. Zhao, H. Zhang, Z. Lu, Y. Fang, X. Chen,

and G. Liu, ‘‘The prognostic value of PD–L1 expression for non-small cell lung
cancer patients: A meta-analysis,’’ Eur. J. Surgical Oncol., vol. 41, no. 4, pp.
450–456, Apr. 2015
[13] J. Lortet-Tieulent, I. Soerjomataram, J. Ferlay, M. Rutherford, E.

Weiderpass, and F. Bray, ‘‘International trends in lung cancer incidence by
histological subtype: Adenocarcinoma stabilizing in men but still increasing in
women,’’ Lung Cancer, vol. 84, no. 1, pp. 13–22, Apr. 2014.
[14] P. Naresh and R. Shettar, ‘‘Image processing and classifification

techniques for early detection of lung cancer for preventive health care: A
survey,’’ Int.J. Recent Trends Eng. Technol., vol. 11, no. 1, p. 595, 2014.
[15] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for

large-scale image recognition,’’ Comput. Sci., 2014.
[16] A Manikandarajan, S Sasikala, Detection and Segmentation of Lymph

Nodes for Lung Cancer Diagnosis. National Conference on System Design and
Information Processing – 2013.
46
[17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘Imagenet classifification
with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf. Process.
Syst., 2012, pp. 1097–1105.
[18] M.S. Al-Tarawneh, “Lung cancer detection using image processing

techniques,” Leonardo Electronic Journal of Practices and Technologies, vol.
20, pp. 147– 58, May 2012.
[19] LUNA16, “Lung tumor analysis 2016.” https://

luna16.grandchallenge.org/.
[20] M.S.Al-Tarawneh, “Lung cancer detection using image processing

techniques,” Leonardo Electronic Journal of Practices and Technologies, vol.
20, pp. 147– 58, May 2012
[21] A. Warth, T. Muley, M. Meister, A. Stenzinger, M. Thomas, P.

Schirmacher, P. A. Schnabel, J. Budczies, H. Hoffmann, and W. Weichert,
‘‘The novel histologic international association for the study of lung
cancer/American thoracic society/European respiratory society classifification
system of lung adenocarcinoma is a stage–independent predictor of survival,’’
J. Clin. Oncol., vol. 30, no. 13, pp. 1438–1446, May 2012.
[22] S. Kundu, R. Mitra, and S. Misra, ‘‘Squamous cell carcinoma lung with
progressive systemic sclerosis,’’ J. Assoc. Phys. India, vol. 60, no. 12, pp. 52–
54, 2012.
[23] D. Sharma and G. Jindal, ‘‘Computer aided diagnosis system for detection
of lung cancer in CT scan images,’’ Int. J. Comput. Elect. Eng., vol. 3, no. 5, pp.
714–718, Sep. 2011.
[24] R. L. Siegel, K. D. Miller, and A. Jemal, ‘‘Cancer statistics, 2015,’’ CA,

Cancer J. Clin., vol. 60, no. 5, pp. 277–300, 2010.
47
[25] X. D. Teng, ‘‘World Health Organization classifification of tumours,
pathology and genetics of tumours of the lung,’’ Zhonghua Bing LI Xue Za
Zhi/Chin. J.Pathol., vol. 34, no. 8, p. 544, 2005.
[26] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based learning

applied to document recognition,’’ Proc. IEEE, vol. 86, no. 11, pp. 2278–2324,
[27] Albert Chon, Peter Lu, NiranjanBalachandar “Deep Convolutional Neural
Networks for Lung Cancer Detection”.
48

Deep Learning: Lung Cancer Detection Using

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning: Lung Cancer Detection Using

Uploaded by

Copyright:

Available Formats

LUNG CANCER DETECTION USING

WILSON SAMUEL. S 721917106095

in partial fulfillment for the award of the degree

ELECTRONICS AND COMMUNICATION ENGINEERIG

DHANALAKSHMI SRINIVASAN COLLEGE OF ENGINEERING ,

COIMBATORE – 641 105

ANNA UNIVERSITY : CHENNAI 600 025

Certified that this project report “ LUNG CANCER DETECTION USING

DEEP LEARINING ” is the bona fide work of

“DIVAGAR.L(721917106021), SATHYA.H (72191710675),

THILAKESH.A (721917106090), WILSON SAMUEL.S (721917106095) ”

who carried out the project work under my supervision.

HEAD OF THE DEPARTMENT SUPERVISOR

Assistant Professor Assistant Professor

Department of Electronics and Department of Electronics and

Submitted for the University Viva-Voce Examination held

It is our privilege to express our gratitude to secretary Shri. P.

NEELRAJ and Director Dr. N. VINOTH, for giving us constant inspiration

and motivation to pursue this project work.

It gives us an immense pleasure to express our gratitude to our

encouragement through the project work.

We wish to express our sincere thanks and gratitude to Mr. S.

MUKUNTHAN M.TECH Assistant Professor and Head, Department of

Electronics and Communication Engineering for providing invaluable insights

into the subject and helping us whenever possible.

We extended sincere thanks to Mrs.S.G. RAMA PRIYANGA , M.E

Internal guide, Assistant Professor Department of Electronics and

Communication Engineering for his valuable guidance, suggestion and for

successful completion of the project.

We wish to extend our thanks to all Teaching and non-teaching staff

members of the Department of Electronics and Communication Engineering for

their kind and patient help throughout the project work.

CHAPTER TITLE PAGE NO

4 SYSTEM ANALYSIS : EXISTING MODEL 18

5 SYSTEM ANALYSIS: PROPOSED MODEL 24

FIG NO TITLES PAGE NO

+ns of CNN is 96%, which is more efficient when compared to accuracy

2.1. TITLE: Lung cancer detection and classification using deep

AUTHORS: Ruchita Tekade ,K.Rajeshwari.

In recent years, so many Computer Aided Diagnosis (CAD) systems are

Image-based computer-aided diagnosis (CADx) algorithm by use

Like human face or hand written characters, there is no obvious structure

2.3. TITLE: Detection and classification of Pulmonary Nodules using

AUTHORS: Patrice Monkam, Shouliang Qi , He Ma, Weiming Gao,

CT screening has been proven to be effective for diagnosing lung cancer

2.4. TITLE: Multiple Resolution Residually connected feature

AUTHORS: Jue Jiang , Yu-Chi Hu , Chia-Ju Liu , Darragh Halpenny ,

Volumetric lung tumor segmentation and accurate longitudinal tracking

AUTHORS : Qing Li, Weidong Cai, David Dagan Feng

Automatic feature learning from image data has thus emerged as a

3.1. SOFTWARE USED

PROGRAMMING LANGUAGE USED : PYTHON

3.3. SOFTWARE DETAILS

3.3.1. PYTHON FEATURES

Python is a multi-paradigm programming language. Object-oriented

Python's design offers some support for functional programming in the

Fig 3.3.1 Python Features

3.3.2. PYTHON PHILOSOPHY

The language's core philosophy is summarized in the document The Zen

• Beautiful is better than ugly.

Python's developers strive to avoid premature optimization, and reject

A common neologism in the Python community is pythonic, which can