You are on page 1of 57

LUNG CANCER DETECTION USING

DEEP LEARNING

A PROJECT REPORT

Submitted by ,

DIVAGAR. L 721917106021

SATHYA. H 721917106075

THILAKESH. A 721917106090

WILSON SAMUEL. S 721917106095

in partial fulfillment for the award of the degree

of

BACHELOR OF ENGINEERING
in

ELECTRONICS AND COMMUNICATION ENGINEERIG

DHANALAKSHMI SRINIVASAN COLLEGE OF ENGINEERING ,

COIMBATORE – 641 105

ANNA UNIVERSITY : CHENNAI 600 025

APRIL 2021
ANNA UNIVERSITY : CHENNAI 600 025

BONAFIDE CERTIFICATE

Certified that this project report “ LUNG CANCER DETECTION USING

DEEP LEARINING ” is the bona fide work of

“DIVAGAR.L(721917106021), SATHYA.H (72191710675),

THILAKESH.A (721917106090), WILSON SAMUEL.S (721917106095) ”

who carried out the project work under my supervision.

__________________ ____________________

SIGNATURE SIGNATURE
Mr. S.MUKUNTHAN M.TECH Mrs.S.G. RAMA PRIYANGA, M.E

HEAD OF THE DEPARTMENT SUPERVISOR

Assistant Professor Assistant Professor

Department of Electronics and Department of Electronics and


Communication Engineering Communication Engineering
Dhanalakshmi Srinivasan College Dhanalakshmi Srinivasan College
of Engineering of Engineering
Coimbatore Coimbatore

Submitted for the University Viva-Voce Examination held


on…………………

----------------------- ---------------------------
INTERNAL EXAMINER EXTERNAL EXAMINER
ACKNOWLEDGEMENT

It is our privilege to express our gratitude to secretary Shri. P.

NEELRAJ and Director Dr. N. VINOTH, for giving us constant inspiration

and motivation to pursue this project work.

It gives us an immense pleasure to express our gratitude to our

beloved principal Dr. P. MALATHI, M.E, Ph.D, for his valuable support and

encouragement through the project work.

We wish to express our sincere thanks and gratitude to Mr. S.

MUKUNTHAN M.TECH Assistant Professor and Head, Department of

Electronics and Communication Engineering for providing invaluable insights

into the subject and helping us whenever possible.

We extended sincere thanks to Mrs.S.G. RAMA PRIYANGA , M.E

Internal guide, Assistant Professor Department of Electronics and

Communication Engineering for his valuable guidance, suggestion and for

successful completion of the project.

We wish to extend our thanks to all Teaching and non-teaching staff

members of the Department of Electronics and Communication Engineering for

their kind and patient help throughout the project work.

We express heartfelt thanks to our parents and friends for their support

throughout our career. We would like to thank everyone who had helped us

directly and indirectly in this project work. We thank the Lord Almighty.
ABSTRACT

Lung diseases are the disorders that affect the lungs, which
assists in the inhalation process. Lung cancer is one of the common causes of
death among people throughout the world. Early detection of lung cancer can
increase the chance of survival among people.
The overall survival rate for lung cancer patients increases from 14 to
49 % if the disease is detected in time.
Although Computed Tomography (CT) is more efficient than X-ray.
Generally, it requires multiple imaging methods to complement each other to
obtain a comprehensive diagnosis.
In this work, a deep neural network is modelled to identify lung cancer
from CT images has been proposed.
A densely connected convolutional neural network(Dense Net) and
adaptive boosting algorithm is used to classify the lung as normal or
malignant.
A dataset of 201 lung images is used in which 85% of the images are
used for training and 15% of the images are used for testing and classification.
Experimental results show that the proposed method has achieved an
accuracy of 90.85%.

i
LIST OF CONTENTS

CHAPTER TITLE PAGE NO


ABSTRACT i
LIST OF FIGURE v
LIST OF ABBREVIATIONS vi

1 INTRODUCTION 1
1.1 Prediction 1
1.2 Staging 2
1.3 Survey 3

2 LITERATURE SURVEY 4
2.1 Lung cancer detection &
Classification using deep learning 4
2.2 Detection and classification of lung
Abnormalities by use of convolution
Neural network (CNN) and regions with
CNN feature(R-CNN) 4
2.3 Detection and classification of
Pulmonary nodules using convolutional
Neural network 5
2.4 Multiple resolution residually connected
Feature streams ,Automatic lung tumor
Segmentation from CT image 6
2.5 Lung image pulse classification
With Automatic feature learning 7
ii
3 SYSTEM REQUIREMENTS 8
3.1 Software Used 8
3.2 Language 8
3.3 Software Details 8
3.4 Python Modeling and Simulation 12
3.5 Python Advantages 13
3.6 Python Application 15

4 SYSTEM ANALYSIS : EXISTING MODEL 18


4.1 Existing System 18
4.2 Limitations 18
4.3 Main Cause 20

5 SYSTEM ANALYSIS: PROPOSED MODEL 24


5.1 Proposed System 24
5.2 Block Diagram 24
5.3 Working 25
5.4 Advantages 26

6 MODULE DESCRIPTION 27
6.1 Pre-processing 27
6.2 Feature Selection 28
6.3 Feature Extraction 29
6.4 CNN Layers 31
6.5 Data Augmentation 37
6.6 Ada boost Algorithm 38

iii
7 RESULT AND ANALYSIS 41
7.1 Training Part Result 41
7.2 Classification Part Result 43

8 CONCLUSION 44
REFERENCE 45

iv
LIST OF FIGURES

FIG NO TITLES PAGE NO


3.3.1 Python Features 9
3.3.2 Python Standard type
Hierarchy 11
4.1 Existing model 23
5.2 Block diagram of
Proposed system 24
6.1 Data Processing 27
6.3 Feature Extraction 30
6.4 CNN Layers 31
6.4.1 Fully Connected Structure 37
6.5 Augmentation Network 38
6.6 Ada boost Algorithm 40
7.1 Training Part Result 41
7.2 Classification Part Result 43

v
LIST OF ABBREVIATION

ACRONYM ABBREVIATION
ADABOOST Adaptive Boosting
AI Artificial Intelligence
API Application Programming
Interface
CAD Computer Aided Diagnosis
CNN Convolution Neural Network
CT Computed Tomography
EDA Exploration Data Analysis
GLCM Gray-level Co-occurrence Matrix
HU House Field Unit
IDRI Image Database Resource Initiative
LIDC Lung Image Database Consortium
LUNA 16 Lung Nodule Analysis 2016
MRI Magnetic Resource Imaging
MRRN Multiple Resolution Residually
Connected Network
NSCLC Non Small Cell Lung Cancer
PyPI Python Package Index
ROI Region of Interest

vi
CHAPTER-1

INTRODUCTION

Lung cancer has become one of the most common causes of death in the
world . It is one of the most harmful malignant tumors to human health. Its
mortality rate ranks first among malignant tumor deaths and is the number one
killer of cancer deaths among men and women worldwide . There are about 1.8
million new cases of lung cancer per year (13% of all tumors), 1.6 million
deaths (19.4% of all tumors) in the world [4], and 5-year survival rate is only
18% . Lung cancer is a disease of abnormal cells multiplying and growing into a
tumor. The mortality rate of lung cancer is the highest among all other types
cancer. An estimated 85 percent of lung cancer cases in males and 75 percent in
females are caused by cigarette smoking. Lung cancer is one of the most
dreadful diseases in the developing countries and its mortality rate is 19.4% .

1.1. PREDICTION

Lung cancer is one of the most serious cancers in the world, with the
smallest survival rate after the diagnosis, with a gradual increase in the number
of deaths every year. Survival from lung cancer is directly related to its growth
at its detection time. But people do have a higher chance of survival if the
cancer can be detected in the early stages. Cancer cells can be carried away
from the lungs in blood, or lymph fluid that surrounds lung tissue. Lymph flows
through lymphatic vessels, which drain into lymph nodes located in the lungs
and in the centre chest. Lung cancer is one of the most killer diseases in the
developing countries and the detection of the cancer at the early stage is a

1
challenge. Analysis and cure of lung malignancy have been one of the greatest
difficulties faced by humans over the most recent couple of decades. Early
identification of tumor would facilitate in sparing a huge number of lives over
the globe consistently. This paper presents an approach which utilizes a
Convolutional Neural Network (CNN) to classify the tumors found in lung as
malignant or benign. The accuracy obtained by mea

+ns of CNN is 96%, which is more efficient when compared to accuracy


obtained by the traditional neural network systems.

1.2. STAGING

Lung cancer often spreads toward the center of the chest because the
natural flow of lymph out of the lungs is toward the center of the chest. Lung
cancer can be divided into two main groups, non-small cell lung cancer and
small cell lung cancer. These assigned of the lung cancer types are depends on
their cellular characteristicsStaging is based on tumor size and lymph node
location. Presently, CT are said to be more effective than plain chest x-ray in
detecting and diagnosing the lung cancer. Early detection of lung tumor is done
by using many imaging techniques such as Computed Tomography (CT),
Sputum Cytology, Chest X-ray and Magnetic Resonance Imaging (MRI).
Detection means classifying tumor two classes (i)non-cancerous tumor (benign)
and (ii)cancerous tumor (malignant). The chance of survival at the advanced
stage is less when compared to the treatment and lifestyle to survive cancer
therapy when diagnosed at the early stage of the cancer. Manual analysis and
diagnosis system can be greatly improved with the implementation of image
processing techniques. A number of researches on the image processing
techniques to detect the early stage cancer detection are available in the
literature. But the hit ratio of early stage detection of cancer is not greatly

2
improved. With the advancement in the machine learning techniques, the early
diagnosis of the cancer is attempted by lot of researchers. Neural network plays
a key role in the recognition of the cancer cells among the normal tissues, which
in turn provides an effective tool for building an assistive AI based cancer
detection. The cancer treatment will be effective only when the tumor cells are
accurately separated from the normal cells Classification of the tumor cells and
training of the neural network forms the basis for the machine learning based
cancer diagnosis .

1.3. SURVEY

Lung cancer has become one of the most significant diseases in human
history. The World Health Organization estimates the worldwide death toll from
lung cancer will be 10,000,000 by 2030. The 5-year survival rate for advanced
Non Small Cell Lung Cancer (NSCLC) remains disappointingly low. It has
been hypothesized that quantitative image feature analysis can improve
diagnostic/prognostic or predictive accuracy, and therefore will have an impact
on a significant number of patients . In the current study, standard-of-care
clinical computed tomography (CT) scans were used for image feature
extraction. In order to reduce variability for feature extraction, the first and
essential step is to accurately delineate the lung tumors. Accurate delineation of
lung tumors is also crucial for optimal radiation oncology. A common approach
to delineate tumor from CT scans involves radiologists or radiation oncologists
manually drawing the boundary of the tumor. In the majority of cases, manual
segmentation overestimates the lesion volume to ensure the entire lesion is
identified and the process is highly variable A stable accurate segmentation is
critical, as image features (such as texture and shape related features) are
sensitive to small tumor boundary changes.

3
CHAPTER -2

LITERATURE SURVEY

2.1. TITLE: Lung cancer detection and classification using deep


learning- 2019

AUTHORS: Ruchita Tekade ,K.Rajeshwari.

In recent years, so many Computer Aided Diagnosis (CAD) systems are


designed for diagnosis of several diseases. Lung cancer detection at early stage
has become very important and also very easy with image processing and deep
learning techniques. In this study lung patient Computer Tomography (CT) scan
images are used to detect and classify the lung nodules and to detect the
malignancy level of that nodules. The CT scan images are segmented using U-
Net architecture. This paper proposes 3D multipath VGG-like network, which is
evaluated on 3D cubes, extracted from Lung Image Database Consortium and
Image Database Resource Initiative (LIDC-IDRI), Lung Nodule Analysis 2016
(LUNA16) and Kaggle Data Science Bowl 2017 datasets. Prediction from U-
Net and 3D multipath VGG-like network are combined for final results. The
lung nodules are classified and malignancy level is detected using this
architecture with 95.60% of Accuracy and 0.387732 of log loss.

DRAWBACK

Due to the lack of strict clinical guidelines and the resemblance between
the different ILD findings determines the problem of radiological.

4
2.2. TITLE: Detection and classification of lung abnormalities by use of
convolutional neural network (CNN) and regions with CNN
features (R-CNN) - 2018
AUTHORS :Shoji Kido,Yasushi Hirano, Noriaki Hashimoto.

Image-based computer-aided diagnosis (CADx) algorithm by use


of convolutional neural network (CNN) does not necessarily require an image-
feature extractor. Therefore, image-based CADx is powerful compared with
feature-based CADx that requires the image-feature extractor for differential
diagnosis of lung abnormalities such as lung nodules and diffuse lung diseases.
We have also developed an image-based computer-aided detection (CADe)
algorithm by use of regions with CNN features (R-CNN) for detection of lung
abnormalities. We evaluated the performance of image-based CADx by use of
CNN and that of image-based CADe by use of R-CNN for various kinds of lung
abnormalities such as lung nodules and diffuse lung diseases.

DRAWBACK

Like human face or hand written characters, there is no obvious structure


on 32 × 32 patches of lung image.

2.3. TITLE: Detection and classification of Pulmonary Nodules using


Convolutional Neural Networks-2019

AUTHORS: Patrice Monkam, Shouliang Qi , He Ma, Weiming Gao,


Yudong Yao, Wei Qian

CT screening has been proven to be effective for diagnosing lung cancer


at its early manifestation in the form of pulmonary nodules, thus decreasing the
mortality. However, the exponential increase of image data makes their accurate
assessment a very challenging task given that the number of radiologists is

5
limited and they have been overworked. Recently, numerous methods,
especially ones based on deep learning with convolutional neural network
(CNN), have been developed to automatically detect and classify pulmonary
nodules in medical images. In this paper, we present a comprehensive analysis
of these methods and their performances. First, we briefly introduce the
fundamental knowledge of CNN as well as the reasons for their suitability to
medical images analysis.

2.4. TITLE: Multiple Resolution Residually connected feature


streams, Automatic lung tumor segmentation from CT
images-2018

AUTHORS: Jue Jiang , Yu-Chi Hu , Chia-Ju Liu , Darragh Halpenny ,


Matthew D. Hellmann, Joseph O.Deasy.

Volumetric lung tumor segmentation and accurate longitudinal tracking


of tumor volume changes from computed tomography images are essential
for monitoring tumor response to therapy. Hence, we developed two
multiple resolution residually connected network (MRRN) formulations
called incremental-MRRN and dense-MRRN. Our networks simultaneously
combine features across multiple image resolution and feature levels
through residual connections to detect and segment the lung tumors. We
evaluated our method on a total of 1210 non-small cell (NSCLC) lung
tumors and nodules from three data sets consisting of 377 tumors from the
open-source Cancer Imaging Archive (TCIA), 304 advanced stage NSCLC
treated with anti- PD-1 checkpoint immunotherapy from internal institution
MSKCC data set, and 529 lung nodules from the Lung Image Database
Consortium (LIDC).

6
2.5. TITLE: Lung Image Patch Classification with Automatic Feature
Learning-2015

AUTHORS : Qing Li, Weidong Cai, David Dagan Feng

Automatic feature learning from image data has thus emerged as a


different trend recently, to capture the intrinsic image features without manual
feature design. In this paper, we propose to create multi-scale feature extractors
based on an unsupervised learning algorithm; and obtain the image feature
vectors by convolving the feature extractors with the image patches. The auto-
generated image features are data-adaptive and highly descriptive.

7
CHAPTER -3

SYSTEM REQUIREMENTS

3.1. SOFTWARE USED

SOFTWARE : PYTHON

3.2. LANGUAGE

PROGRAMMING LANGUAGE USED : PYTHON

3.3. SOFTWARE DETAILS

3.3.1. PYTHON FEATURES

Python is a multi-paradigm programming language. Object-oriented


programming and structured programming are fully supported, and many of its
features support functional programming and aspect-oriented programming
(including by meta programming and meta objects (magic methods)). Many
other paradigms are supported via extensions, including design by contract and
logic programming. Python uses dynamic typing and a combination of reference
counting and a cycle-detecting garbage collector for memory management. It
also features dynamic name resolution (late binding), which binds method and
variable names during program execution.

Python's design offers some support for functional programming in the


Lisp tradition. It has filter, map, and reduce functions; list
comprehensions, dictionaries, sets, and generator expressions. The standard

8
library has two modules (itertools and functools) that implement functional
tools borrowed from Haskell and Standard ML.

Fig 3.3.1 Python Features

3.3.2. PYTHON PHILOSOPHY

The language's core philosophy is summarized in the document The Zen


of Python (PEP 20), which includes aphorisms such as:

• Beautiful is better than ugly.


• Explicit is better than implicit.
• Simple is better than complex.
• Complex is better than complicated.
• Readability counts.

Rather than having all of its functionality built into its core, Python was
designed to be highly extensible. This compact modularity has made it
particularly popular as a means of adding programmable interfaces to existing

9
applications. Van Rossum's vision of a small core language with a large
standard library and easily extensible interpreter stemmed from his frustrations
with ABC, which espoused the opposite approach. Python strives for a simpler,
less-cluttered syntax and grammar while giving developers a choice in their
coding methodology. In contrast to Perl's "there is more than one way to do it"
motto, Python embraces a "there should be one—and preferably only one—
obvious way to do it" design philosophy. Alex Martelli, a Fellow at the Python
Software Foundation and Python book author, writes that "To describe
something as 'clever' is not considered a compliment in the Python culture.

Python's developers strive to avoid premature optimization, and reject


patches to non-critical parts of the CPython reference implementation that
would offer marginal increases in speed at the cost of clarity. When speed is
important, a Python programmer can move time-critical functions to extension
modules written in languages such as C, or use PyPy, a just-in-time compiler.
Cython is also available, which translates a Python script into C and makes
direct C-level API calls into the Python interpreter. An important goal of
Python's developers is keeping it fun to use. This is reflected in the language's
name—a tribute to the British comedy group Monty Python and in occasionally
playful approaches to tutorials and reference materials, such as examples that
refer to spam and eggs (from a famous Monty Python sketch) instead of the
standard foo and bar.

A common neologism in the Python community is pythonic, which can


have a wide range of meanings related to program style. To say that code is
pythonic is to say that it uses Python idioms well, that it is natural or shows
fluency in the language, that it conforms with Python's minimalist philosophy
and emphasis on readability. In contrast, code that is difficult to understand or
reads like a rough transcription from another programming language is called

10
unpythonic. Users and admirers of Python, especially those considered
knowledgeable or experienced, are often referred to as Pythonistas.

Fig 3.3.2 Python standard Type Hierarchy

3.3.3. INDENTATION SYNTAX

Python uses whitespace indentation, rather than curly brackets or


keywords, to delimit blocks. An increase in indentation comes after certain
statements; a decrease in indentation signifies the end of the current block.
Thus, the program's visual structure accurately represents the program's
semantic structure.[6] This feature is sometimes termed the off-side rule, which
some other languages share, but in most languages indentation doesn't have any
semantic meaning.

11
3.4. PYTHON MODELING AND SIMULATION

Modeling and Simulation in Python is an introduction to physical


modeling using a computational approach. It is organized in three parts:

• The first part presents discrete models, including a bike share system and
world population growth.
• The second part introduces first-order systems, including models of
infectious disease, thermal systems, and pharma co kinetics.
• The third part is about second-order systems, including mechanical
systems like projectiles, celestial mechanics, and rotating rigid bodies.

Taking a computational approach makes it possible to work with more


realistic models than what you typically see in a first-year physics class, with
the option to include features like friction and drag. Python is an ideal
programming language for this material. It is a good first language for people
who have not programmed before, and it provides high-level data structures that
are well-suited to express solutions to the problems we are interested in.

Modeling and Simulation in Python is a Free Book. It is available under


the a Creative Commons license, which means that you are free to copy,
distribute, and modify it, as long as you attribute the source and don’t use it for
commercial purposes.

3.4.1. PYTHON SIMPY SIMULATOR

SimPy is a process-based discrete-event simulation framework based on


standard Python. Processes in SimPy are defined by Python generator functions
and may, for example, be used to model active components like customers,
vehicles or agents. SimPy also provides various types of shared resources to
model limited capacity congestion points (like servers, checkout counters and

12
tunnels). Simulations can be performed “as fast as possible”, in real time (wall
clock time) or by manually stepping through the events. Though it is
theoretically possible to do continuous simulations with SimPy, it has no
features that help you with that. On the other hand, SimPy is overkill for
simulations with a fixed step size where your processes don’t interact with each
other or with shared resources.

3.5. PYTHON ADVANTAGES

The diverse application of the Python language is a result of the


combination of features which give this language an edge over others. Some of
the benefits of programming in Python include:

3.5.1. Presence of Third Party Modules

The Python Package Index (PyPI) contains numerous third-party modules


that make Python capable of interacting with most of the other languages and
platforms.

3.5.2. Extensive Support Libraries

Python provides a large standard library which includes areas like internet
protocols, string operations, web services tools and operating system interfaces.
Many high use programming tasks have already been scripted into the standard
library which reduces length of code to be written significantly.

3.5.3 Open Source and Community Development

Python language is developed under an OSI-approved open source


license, which makes it free to use and distribute, including for commercial

13
purposes. Further, its development is driven by the community which
collaborates for its code through hosting conferences and mailing lists, and
provides for its numerous modules.

3.5.4. Learning Ease and Support Available

Python offers excellent readability and uncluttered simple-to-learn syntax


which helps beginners to utilize this programming language. The code style
guidelines, PEP 8, provide a set of rules to facilitate the formatting of code.
Additionally, the wide base of users and active developers has resulted in a rich
internet resource bank to encourage development and the continued adoption of
the language.

3.5.5. User-friendly Data Structures:

Python has built-in list and dictionary data structures which can be used
to construct fast runtime data structures. Further, Python also provides the
option of dynamic high-level data typing which reduces the length of support
code that is needed.

3.5.6. Productivity and Speed

Python has clean object-oriented design, provides enhanced process


control capabilities, and possesses strong integration and text processing
capabilities and its own unit testing framework, all of which contribute to the
increase in its speed and productivity. Python is considered a viable option for
building complex multi-protocol network applications.

14
3.7. PYTHON APPLICATIONS

Python supports cross-platform operating systems which makes


building applications with it all the more convenient. Some of the globally
known applications such as YouTube, Bit Torrent, Drop Box, etc. use Python to
achieve their functionality.

3.7.1. Web Development

Python can be used to make web-applications at a rapid rate. Why is that?


It is because of the frameworks Python uses to create these applications. There
is common-backend logic that goes into making these frameworks and a number
of libraries that can help integrate protocols such as HTTPS, FTP, SSL etc. and
even help in the processing of JSON, XML, E-Mail and so much more. Some of
the most well-known frameworks are Django, Flask, Pyramid. The security,
scalability, convenience that they provide is unparalleled compared to starting
the development of a website from scratch.

3.7.2. Game Development

Python is also used in the development of interactive games. There are


libraries such as PySoy which is a 3D game engine supporting Python 3,
PyGame which provides functionality and a library for game development.
Games such as Civilization-IV, Disney’s Toontown Online, Vega Strike etc.
have been built using Python.

3.7.3. Python Scripting Certification Training

• Self-Paced Learning
• Real-life Case Studies

15
• Assignments
• Lifetime Access

3.7.4. Machine Learning and Artificial Intelligence


Machine Learning and Artificial Intelligence are the talks of the town as
they yield the most promising careers for the future. We make the computer
learn based on past experiences through the data stored or better yet, create
algorithms which makes the computer learn by itself. The programming
language that mostly everyone chooses? It’s Python. Why? Support for these
domains with the libraries that exist already such as Pandas, Scikit-Learn,
NumPy and so many more. Learn the algorithm, use the library and you have
your solution to the problem. It is that simple. But if you want to go the
hardcore way, you can design your own code which yields a better solution,
which still is much easier compared to other languages.

3.7.5. Data Science and Data Visualization

Data is money if you know how to extract relevant information which


can help you take calculated risks and increase profits. You study the data you
have, perform operations and extract the information required. Libraries such as
Pandas, NumPy help you in extracting information. You can even visualize the
data libraries such as Matplotlib, Seaborn, which are helpful in plotting graphs
and much more. This is what Python offers you to become a Data Scientist.

3.7.6. Desktop GUI

Python can be used to program desktop applications. It provides the


Tkinter library that can be used to develop user interfaces. There are some other
useful toolkits such as the wxWidgets, Kivy, PYQT that can be used to create
applications on several platforms. You can start out with creating simple

16
applications such as Calculators, To-Do apps and go ahead and create much
more complicated applications.

3.7.7. Web Scraping Applications


Python can be used to pull a large amount of data from websites which
can then be helpful in various real-world processes such as price comparison,
job listings, research and development and much more.

3.7.8. Programming & Frameworks Training

Python has a library called Beautiful Soup which can be used to pull such
data and be used accordingly. Here’s a full-fledged guide to learn Web scraping
with Python.

3.7.9. Business Applications

Business Applications are different than our normal applications covering


domains such as e-commerce, ERP and many more.

17
CHAPTER-4

SYSTEM ANALYSIS : EXISTING MODEL

4.1. EXISTING SYSTEM

Roy, Sirohi, and Patle developed a system to detect lung cancer nodule
using fuzzy interference system and active contour model. This system uses gray
transformation for image contrast enhancement. Image binarization is performed
before segmentation and resulted image is segmented using active contour model.
Cancer classification is performed using fuzzy inference method. Features like
area, mean, entropy, correlation, major axis length, minor axis length are
extracted to train the classifier. Overall, accuracy of the system is 94.12%.
Counting its limitation it does not classify the cancer as benign or malignant
which is future scope of this proposed model. Ignatious and Joseph [8] developed
a system using watershed segmentation. In pre processing it uses Gabor filter to
enhance the image quality. It compares the accuracy with neural fuzzy model and
region growing method. Accuracy of the proposed is 90.1% which is
comparatively higher than the model with segmentation using neural fuzzy model
and region growing method. The advantage of this model is that it uses marker
controlled watershed segmentation which solves over segmentation problem. As
a limitation it does not classify the cancer as benign or malignant and accuracy is
high but still not satisfactory. Some changes and contribution in this model has
probability of increasing the accuracy to satisfactory level .

4.2 LIMITATIONS

Gonzalez and Ponomaryvo proposed a system that classifies lung cancer


as benign or malignant. The system uses the priori information and
HousefieldUnit(HU) to calculate the Region of Interest(ROI). Shape features like

18
area, eccentricity, circularity, fractal dimension and textural features like mean,
variance, energy, entropy, skewness, contrast, and smoothness are extracted to
train and classify the support vector machine to identify whether the nodule is
benign or malignant. The advantage of this model is that it classifies cancer as
benign or malignant, however the limitation of it is that prior information is
required about region of interest.

Model’s classification of benign or malignant using support vector


machine can be useful in our new model. Analyzing the literature reviews, on the
basis of accuracy and advantages of the steps used, the system proposed by
Ignatious and Joseph is current best solution. In image pre processing it uses
Gabor filter to enhance the image and uses marker controlled watershed method
for segmentation and detects the cancer nodule. This model also extracts the
features like area, perimeter, and eccentricity only of the cancer nodules. It shows
the comparison with other previously proposed models and highlights its
accuracy 90.1% which is higher than of those. Even the system is current best
solution (fig. 3.4), it has some limitations. Only few features has been extracted
for cancer nodules No preprocessing like noise removal, image smoothing
which can probably assists in increasing the detection of nodules accurately has
been implemented .

No classification as benign or malignant of extracted cancer has been


performed Fig. 1. Current Best model and its limitations (Ignatious and
Joseph,2015) 3. Proposed Model Changes on current best solution have been
made and new model has been proposed. Instead of Gabor Filter, Median filter
and Gaussian filter have been implemented in pre- processing stage. After pre
processing the processed image is segmented using watershed segmentation. This
gives the image with cancer nodules marked. In addition to features like area,
perimeter and eccentricity, features like Centroid, Diameter and pixel Mean
Intensity have been extracted in feature extraction stage for the detected cancer

19
nodules. The best model ends after the detection of cancer nodule, it’s feature
extraction and calculation of accuracy. But, its classification as benign or
malignant has not been implemented. Therefore, additional stage of classification
of cancer nodule has been performed using Support Vector Machine. Extracted
features are used as training features and trained model is generated. Then,
unknown detected cancer nodule is classified using that trained prediction model.
3.2 Image Preprocessing Firstly, in image pre-processing median filter is used on
grayscale image of CT scan images. Some noises are embedded on CT Images at
the time of image acquisition process which aids in false detection of nodules.

4.3 MAIN CAUSE

Noise Suren Makaju et al. / Procedia Computer Science 125 (2018) 107–
11410921. Introduction Lung cancer is one of the causes of cancer deaths. It is
difficult to detect because it arises and shows symptoms in final stage. However,
mortality rate and probability can be reduced by early detection and treatment of
the disease. Best imaging technique CT imaging are reliable for lung cancer
diagnosis because it can disclose every suspected and unsuspected lung cancer
nodules . However, variance of intensity in CT scan images and anatomical
structure misjudgment by doctors and radiologists might cause difficulty in
marking the cancerous cell . Recently, to assist radiologists and doctors detect the
cancer accurately computer Aided Diagnosis has become supplement and
promising tool [3].

There has been many system developed and research going on detection of
lung cancer. However, some systems do not have satisfactory accuracy of
detection and some systems still has to be improved to achieve highest accuracy
tending to 100%. Image processing techniques and machine learning techniques
has been implemented to detect and classify the lung cancer. We studied recent
systems developed for cancer detection based on CT scan images of lungs to
choose the recent best systems and analysis was conducted on them and new
20
model was proposed. 2. Literature Review Several researchers has proposed and
implemented detection of lung cancer using different approaches of image
processing and machine learning. Aggarwal, Furquan and Kalra proposed a
model that provides classification between nodules and normal lung anatomy
structure. The method extracts geometrical, statistical and gray level
characteristics. LDA is used as classifier and optimal thresholding for
segmentation. The system has 84% accuracy, 97.14% sensitivity and 53.33%
specificity.

Although the system detects the cancer nodule, its accuracy is still
unacceptable. No any machine learning techniques has been used to classify and
simple segmentation techniques is used. Therefore, combination of any of its
steps in our new model does not provide probability of improvement. Jin, Zhang
and Jin used convolution neural network as classifier in his CAD system to
detect the lung cancer.

The system has 84.6% of accuracy, 82.5% of sensitivity and 86.7% of


specificity. The advantage of this model is that it uses circular filter in Region of
interest (ROI) extraction phase which reduces the cost of training and recognition
steps. Although, implementation cost is reduced, it has still unsatisfactory
accuracy. Sangamithraa and Govindaraju [6] uses K mean unsupervised learning
algorithm for clustering or segmentation. It groups the pixel dataset according to
certain characteristics. For classification this model implements back propagation
network. Features like entropy, correlation, homogeneity, PSNR, SSIM are
extracted using gray-level co-occurrence matrix (GLCM) method.

The system has accuracy of about 90.7%. Image pre processing median
filter is used for noise removal which can be useful for our new model to remove
the noise and improve the accuracy. Roy, Sirohi, and Patle developed a system
to detect lung cancer nodule using fuzzy interference system and active contour
model. This system uses gray transformation for image contrast enhancement.

21
Image binarization is performed before segmentation and resulted image is
segmented using active contour model. Cancer classification is performed using
fuzzy inference method. Features like area, mean, entropy, correlation, major axis
length, minor axis length are extracted to train the classifier.

Overall, accuracy of the system is 94.12%. Counting its limitation it does


not classify the cancer as benign or malignant which is future scope of this
proposed model. Ignatious and Joseph developed a system using watershed
segmentation. In pre processing it uses Gabor filter to enhance the image quality.
It compares the accuracy with neural fuzzy model and region growing method.
Accuracy of the proposed is 90.1% which is comparatively higher than the model
with segmentation using neural fuzzy model and region growing method. The
advantage of this model is that it uses marker controlled watershed segmentation
which solves over segmentation problem.

As a limitation it does not classify the cancer as benign or malignant and


accuracy is high but still not satisfactory. Some changes and contribution in this
model has probability of increasing the accuracy to satisfactory level. Gonzalez
and Ponomaryvo [9] proposed a system that classifies lung cancer as benign or
malignant. The system uses the priori information and HousefieldUnit(HU) to
calculate the Region of Interest(ROI). Shape features like area, eccentricity,
circularity, fractal dimension and textural features like mean, variance, energy,
entropy, skewness, contrast, and smoothness are extracted to train and classify
the support vector machine to identify whether the nodule is benign or malignant.
The advantage of this model is that it classifies cancer as benign or malignant,
3however the limitation of it is that prior information is required about region of
interest. Model’s classification of benign or malignant using support vector
machine can be useful in our new model.

Analyzing the literature reviews, on the basis of accuracy and advantages of


the steps used, the system proposed by Ignatious and Joseph is current best

22
solution. In image pre processing it uses Gabor filter to enhance the image
and uses marker controlled watershed method for segmentation and detects
the cancer nodule.

Fig 4.1 Existing Model

23
CHAPTER-5

PROPOSED MODEL

5.1 PROPOSED SYSTEM

To address this problem, bionic convolutional neural networks


are proposed to reduced the number of parameters and adapt the network
architecture specifically to vision tasks. Convolutional neural networks are
usually composed by a set of layers that can be grouped by their functionalities.

5.2 BLOCK DIAGRAM

Training Pre- Feature Feature


images processing selection Extraction
CNN

Testing Pre- Feature Feature


images processing selection Extraction

Normal Abnormal
image image

Fig 5.2 Block Diagram of Proposed System

24
5.3 WORKING

The project working is divided into two parts, ie training and testing
parts. A sample of 100 images are used where 60 images are used for training
and the remaining 40 images are used for testing.

In training part, images are first pre-processed, where resizing and


blur removal of images are done by using histogram equalization due to which
non-linear images are stretched and pixel values are redistributed. The small
image received is made visible for human eyes using data augmentation which
prevents over fitting and biasing. Feature selection is done to select images
based on colour, size that are given where the image is been converted from
spatial domain to frequency domain. The LH and HL frequencies of the images
are opted from feature extraction.

In testing part, images are first pre-processed, where resizing and


blur removal of images are done by using histogram equalization due to which
non-linear images are stretched and pixel values are redistributed. The small
image received is made visible for human eyes using data augmentation which
prevents over fitting and biasing. Feature selection is done to select images
based on colour, size that are given where the image is been converted from
spatial domain to frequency domain. The LH and HL frequencies of the images
are opted from feature extraction.

The results obtained from training and testing part are fed into in
CNN layers where the images are classified and the output is obtained. The
algorithm used for classification is ADABOOST (Adaptive Boosting), where
accuracy calculation is of the images is done based on the sample weights of the
images.

25
5.4 ADVANTAGES

➢ The main advantage of CNN compared to its predecessors is that it


automatically detects the important features without any human
supervision.
➢ It is useful to classify the Lung Cancer images for accurate detection.
➢ Lung Cancer will be detected in an early stages.

26
CHAPTER 6
MODULE DESCRIPTION

6.1 PRE-PROCESSING
Pre-processing refers to the transformations applied to our data before
feeding it to the algorithm. Data Preprocessing is a technique that is used to
convert the raw data into a clean data set. In other words, whenever the data is
gathered from different sources it is collected in raw format which is not
feasible for the analysis.

Fig 6.1 Data Processing

6.1.1 Need for Data Processing

For achieving better results from the applied model in Machine Learning
projects the format of the data has to be in a proper manner. Some
specified Machine Learning model needs information in a specified
format, for example, Random Forest algorithm does not support null
values, therefore to execute random forest algorithm null values have to be
managed from the original raw data set . Another aspect is that data set

27
should be formatted in such a way that more than one Machine Learning
and Deep Learning algorithms are executed in one data set, and best out of
them is chosen.

6.2 FEATURE SELECTION

Dimensionality reduction is the process of reducing the number of random


variables under consideration, by obtaining a set of principal variables. It can be
divided into feature selection and feature extraction.
Dimensionality Reduction is an important factor in predictive modeling.
Various proposed methods have introduced different approaches to do so by
either graphically or by various other methods like filtering, wrapping or
embedding. However, most of these approaches are based on some threshold
values and benchmark algorithms that determine the optimality of the features
in the dataset.

One motivation for dimensionality reduction is that higher dimensional data


sets increase the time complexity and also the space required will be more.
Also, all the features in the dataset might not be useful. Some may contribute no
information at all, while some may contribute similar information as the other
features. Selecting the optimal set of features will help us hence reduce the
space and time complexity as well as increase the accuracy or purity of
classification (or regression) and clustering (or association) for supervised and
unsupervised learning respectively.

Feature selection has four different approaches such as filter approach, wrapper
approach, embedded approach, and hybrid approach.

1.Wrapper approach : This approach has high computational complexity. It


uses a learning algorithm to evaluate the accuracy produced by the use of the

28
selected features in classification. Wrapper methods can give high classification
accuracy for particular classifiers.
2.Filter approach : A subset of features is selected by this approach without
using any learning algorithm. Higher-dimensional datasets use this method and
it is relatively faster than the wrapper-based approaches.
3.Embedded approach : The applied learning algorithms determine the
specificity of this approach and it selects the features during the process of
training the data set.
4.Hybrid approach : Both filter and wrapper-based methods are used in hybrid
approach. This approach first selects the possible optimal feature set which is
further tested by the wrapper approach. It hence uses the advantages of both
filter and wrapper-based approach.

6.3 FEATURE EXTRACTION

If the number of features becomes similar (or even bigger!) than the number
of observations stored in a dataset then this can most likely lead to a Machine
Learning model suffering from over fitting. In order to avoid this type of
problem, it is necessary to apply either regularization or dimensionality
reduction techniques (Feature Extraction). In Machine Learning, the
dimensionality reduction of a dataset is equal to the number of variables used to
represent it.

Using Regularization could certainly help reduce the risk of over fitting, but
using instead Feature Extraction techniques can also lead to other types of
advantages such as:

• Accuracy improvements.

• Over fitting risk reduction.

29
• Speed up in training.

• Improved Data Visualization.

• Increase in explain ability of our model.

Feature Extraction aims to reduce the number of features in a dataset by


creating new features from the existing ones (and then discarding the original
features). These new reduced set of features should then be able to summarize
most of the information contained in the original set of features. In this way, a
summarised version of the original features can be created from a combination
of the original set .Another commonly used technique to reduce the number of
feature in a dataset is Feature Selection. The difference between Feature
Selection and Feature Extraction is that feature selection aims instead to rank the
importance of the existing features in the dataset and discard less important ones
(no new features are created).

Fig 6.3 Feature Extraction


30
6.4 CNN LAYERS

A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning


algorithm which can take in an input image, assign importance (learnable
weights and biases) to various aspects/objects in the image and be able to
differentiate one from the other. The pre-processing required in a ConvNet is
much lower as compared to other classification algorithms. While in primitive
methods filters are hand-engineered, with enough training, ConvNets have the
ability to learn these filters/characteristics.

The architecture of a ConvNet is analogous to that of the connectivity


pattern of Neurons in the Human Brain and was inspired by the organization of
the Visual Cortex. Individual neurons respond to stimuli only in a restricted
region of the visual field known as the Receptive Field. A collection of such
fields overlap to cover the entire visual area.

Fig 6.4 CNN Layers

31
Why ConvNets over Feed-Forward Neural Nets?

Flattening of a 3x3 image matrix into a 9x1 vector

An image is nothing but a matrix of pixel values, right? So why not just
flatten the image (e.g. 3x3 image matrix into a 9x1 vector) and feed it to a Multi-
Level Perceptron for classification purposes? Uh.. not really.

In cases of extremely basic binary images, the method might show an


average precision score while performing prediction of classes but would have
little to no accuracy when it comes to complex images having pixel
dependencies throughout.

A ConvNet is able to successfully capture the Spatial and Temporal


dependencies in an image through the application of relevant filters. The
architecture performs a better fitting to the image dataset due to the reduction in
the number of parameters involved and reusability of weights. In other words,
the network can be trained to understand the sophistication of the image better.

32
There are four layered concepts we should understand in Convolutional Neural
Networks:

1. Convolution,
2. ReLu
3. Pooling and
4. Full Connectedness (Fully Connected Layer).

6.4.1 Convolutional Layer

The convolutional layer is the core building block of a CNN. The layer's
parameters consist of a set of learnable filters (or kernels), which have a small
receptive field, but extend through the full depth of the input volume. During
the forward pass, each filter is convolved across the width and height of the
input volume, computing the dot product between the entries of the filter and
the input and producing a 2-dimensional activation map of that filter. As a
result, the network learns filters that activate when it detects some specific type
of feature at some spatial position in the input.

Stacking the activation maps for all filters along the depth dimension
forms the full output volume of the convolution layer. Every entry in the output
volume can thus also be interpreted as an output of a neuron that looks at a
small region in the input and shares parameters with neurons in the same
activation map.

When dealing with high-dimensional inputs such as images, it is


impractical to connect neurons to all neurons in the previous volume because
such a network architecture does not take the spatial structure of the data into
account. Convolutional networks exploit spatially local correlation by enforcing

33
a sparse local connectivity pattern between neurons of adjacent layers: each
neuron is connected to only a small region of the input volume.

The extent of this connectivity is a hyperparameter called the receptive field of


the neuron. The connections are local in space (along width and height), but
always extend along the entire depth of the input volume. Such an architecture
ensures that the learnt filters produce the strongest response to a spatially local
input pattern.

6.4.2 ReLu Layer

In a neural network, the activation function is responsible for transforming


the summed weighted input from the node into the activation of the node or
output for that input.

The rectified linear activation function is a piecewise linear function that


will output the input directly if is positive, otherwise, it will output zero. It has
become the default activation function for many types of neural networks
because a model that uses it is easier to train and often achieves better
performance.

In order to use stochastic gradient descent with back propagation of


errors to train deep neural networks, an activation function is needed that looks
and acts like a linear function, but is, in fact, a nonlinear function allowing
complex relationships in the data to be learned.
The function must also provide more sensitivity to the activation sum input and
avoid easy saturation.

A node or unit that implements this activation function is referred to as


a rectified linear activation unit, or ReLU for short. Often, networks that use
the rectifier function for the hidden layers are referred to as rectified networks.

34
6.4.3 Pooling Layer

Pooling layers provide an approach to down sampling feature maps by


summarizing the presence of features in patches of the feature map. Two
common pooling methods are average pooling and max pooling that summarize
the average presence of a feature and the most activated presence of a feature
respectively.

A pooling layer is a new layer added after the convolutional layer.


Specifically, after a nonlinearity (e.g. ReLU) has been applied to the feature
maps output by a convolutional layer. The addition of a pooling layer after the
convolutional layer is a common pattern used for ordering layers within a
convolutional neural network that may be repeated one or more times in a given
model.

The pooling layer operates upon each feature map separately to create a
new set of the same number of pooled feature maps. Pooling involves selecting
a pooling operation, much like a filter to be applied to feature maps. The size of
the pooling operation or filter is smaller than the size of the feature map;
specifically, it is almost always 2×2 pixels applied with a stride of 2 pixels .This
means that the pooling layer will always reduce the size of each feature map by
a factor of 2, e.g. each dimension is halved, reducing the number of pixels or
values in each feature map to one quarter the size. For example, a pooling layer
applied to a feature map of 6×6 (36 pixels) will result in an output pooled
feature map of 3×3 (9 pixels).

Two common functions used in the pooling operation are:

• Average Pooling: Calculate the average value for each patch on the feature
map.

35
• Maximum Pooling (or Max Pooling): Calculate the maximum value for each
patch of the feature map.
The result of using a pooling layer and creating down sampled or pooled feature
maps is a summarized version of the features detected in the input. They are
useful as small changes in the location of the feature in the input detected by the
convolutional layer will result in a pooled feature map with the feature in the
same location. This capability added by pooling is called the model’s invariance
to local translation.

6.4.4 Fully Connected Layer

Fully connected layers are an essential component of Convolutional Neural


Networks (CNNs), which have been proven very successful in recognizing and
classifying images for computer vision. The CNN process begins with
convolution and pooling, breaking down the image into features, and analyzing
them independently. The result of this process feeds into a fully connected
neural network structure that drives the final classification decision.

The objective of a fully connected layer is to take the results of the


convolution/pooling process and use them to classify the image into a label

The output of convolution/pooling is flattened into a single vector of values,


each representing a probability that a certain feature belongs to a label. For
example, if the image is of a cat, features representing things like whiskers or
fur should have high probabilities for the label “cat”.

The fully connected part of the CNN network goes through its own back
propagation process to determine the most accurate weights. Each neuron
receives weights that prioritize the most appropriate label. Finally, the neurons
“vote” on each of the labels, and the winner of that vote is the classification
decision.

36
Fig 6.4.1 Fully Connected Structure

6.5 DATA AUGMENTATION


Data augmentation is the process of increasing the amount and diversity of
data. We do not collect new data, rather we transform the already present data.

1. Need for data augmentation


Data augmentation is an integral process in deep learning, as in deep
learning we need large amounts of data and in some cases it is not feasible to
collect thousands or millions of images, so data augmentation comes to the
rescue.
It helps us to increase the size of the dataset and introduce variability in the
dataset.

37
2. Operations in data augmentation
The most commonly used operations are-
1. Rotation
2. Shearing
3. Zooming
4. Cropping
5. Flipping
6. Changing the brightness level

Fig 6.5 Augmentation Network

6.6 ADABOOST ALGORITHM


Ada Boost is short for Adaptive Boosting. Basically, Ada Boosting was
the first really successful boosting algorithm developed for binary classification.
Also, it is the best starting point for understanding boosting. Moreover, modern
boosting methods build on Ada Boost, most notably stochastic gradient
boosting machines.
Generally, Ada Boost is used with short decision trees. Further, the first
tree is created, the performance of the tree on each training instance is used.
Also, we use it to weight how much attention the next tree. Thus, it is
38
created should pay attention to each training instance. Hence, training data that
is hard to predict is given more weight. Although, whereas easy to predict
instances are given less weight.
• Ada Boosting is best used to boost the performance of decision trees and
this is based on binary classification problems.
• Ada Boost was originally called AdaBoost.M1 by the author. More recently
it may be referred to as discrete Ada Boost. As because it is used for
classification rather than regression.
• Ada Boost can be used to boost the performance of any machine learning
algorithm. It is best used with weak learners.
Each instance in the training dataset is weighted. The initial weight is set to:
weight(xi) = 1/n
Where xi is the training instance and n is the number of training instances.

A weak classifier is prepared on the training data using the weighted samples.
Only binary classification problems are supported. So each decision stump
makes one decision on one input variable. And outputs a +1.0 or -1.0 value for
the first or second class value.

The misclassification rate is calculated for the trained model. Traditionally,


this is calculated as:

error = (correct – N) / N

Where error is the misclassification rate. While correct is the number of


training instance predicted by the model. And N is the total number of training
instances.

• Basically, weak models are added sequentially, trained using the weighted
training data.
• Generally, the process continues until a pre-set number of weak learners
have been created.

39
• Once completed, you are left with a pool of weak learners each with a stage
value.
Predictions are made by calculating the weighted average of the weak
classifiers.

For a new input instance, each weak learner calculates a predicted value as
either +1.0 or -1.0. The predicted values are weighted by each weak learners
stage value. The prediction for the ensemble model is taken as a sum of the
weighted predictions. If the sum is positive, then the first class is predicted, if
negative the second class is predicted.

Fig 6.6 Ada Boost Algorithm

40
CHAPTER 7
RESULT AND ANALYSIS

7.1 TRAINING PART RESULT

41
Fig 7.1 Training Part Result

The neural network based on convolutional segmentation has been implemented

in PYTHON and the system is trained with sample data sets for the model to

understand and familiarize the lung cancer. A sample image has been fed as an

input to the trained model and the model at this stage is able to tell the presence

of cancer and locate the cancer spot in the sample image of a lung cancer. The

process involves the feeding the input image, preprocessing, feature extraction,

identifying the cancer spot and indicate the results to the user. In case of the

malignancy is present, a message indicating the presence of will be displayed on

the screen along with the given input image

42
7.2 CLASSIFICATION PART RESULT

Fig 7.2 Classification Part Result

Lung cancer detection using the convolutional neural network which model by

the end to end learning, Some of the parameter used for training the model of

the neural network CNN has two layers such as 2 convolution layers and 2

subsampling layer which is used to increase the accuracy of detection. The

confusion matrix parameters derived from CNN output

The confusion matrix shows the true positive, true negative, false positive and

false negative. From the analysis true positive gives the correctly classified the

lung cancer images and false positive gives the misclassification of images

which means that the lung cancer is wrongly predicted as non-cancerous image.

43
CHAPTER 8

CONCLUSION

The main advantage of deep learning over other machine learning


algorithms is its capacity to execute feature engineering on it own. A deep
learning algorithm will scan the data to search for features that correlate and
combine them to enable faster learning without being explicitly told to do so.

Convolutional Neural Network takes advantage of local spatial


coherence in the input (often images), which allow them to have fewer weights
as some parameters are shared. This process, taking the form of convolutions,
makes them especially well suited to extract relevant information at a low
computational cost.

The training and testing of images are done where images are pre-
processed and feature selection and feature extraction of images are done. Once
training and testing part is done successfully, the CNN algorithm classifies the
input lung image either as normal or abnormal and the output will be displayed.

Hence, a Deep CNN network is used for the classification of lung images
for the detection of cancer.

44
REFERENCES

[1] R. L. Siegel, K. D. Miller, and S. A. Fedewa, ‘‘Colorectal cancer


statistics,’’ Cancer J. Clin., vol. 68, no. 1, pp. 7–30, 2018

[2] R. L. Siegel, K. D. Miller, and A. Jemal, ‘‘Cancer statistics, 2017,’’ Cancer


J. Clin., vol. 67, no. 1, pp. 7–30, Jan. 2017.

[3] D. Bhatnagar, A. K. Tiwari, V. Vijayarajan, and A. Krishnamoorthy,


‘‘Classification of normal and abnormal images of lung cancer,’’ IOP Conf.
Ser., Mater. Sci. Eng., vol. 263, Nov. 2017, Art. no. 042100.

[4] S. Rattan, S. Kaur, N. Kansal, and J. Kaur, ‘‘An optimized lung cancer
classification system for computed tomography images,’’ in Proc. 4th Int. Conf.
Image Inf. Process. (ICIIP), Dec. 2017, pp. 1–6.

[5] F. C. Detterbeck, D. J. Boffa, and A. W. Kim, ‘‘The eighth edition lung


cancer stage classifification,’’ Chest, vol. 151, no. 1, pp. 193–203, 2017

[6] Wavelet Recurrent Neural Network for Lung Cancer Classification”:3rd


ICSTcomputer,2017.

[7] A.Kavitha, Anusiyasaral and P.Senthil,” Design Model of Retiming


Multiplier For FIR Filter &its Verification”, International Journal of Pure and
Applied Mathematics, Vol116 No12, 2017, pp. 239-247

[8] T. Yoshiya, T. Mimae, and Y. Tsutani, ‘‘Prognostic role of subtype


classification in small-sized pathologic N0 invasive lung adenocarcinoma,’’
Ann.Thoracic Surg., vol. 102, no. 5, pp. 1668–1673, 2016.

45
[9] K. He, X. Zhang, and S. Ren, ‘‘Deep residual learning for image
recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2016,
pp. 770–778.

[10] S Sasikala, M Ezhilarasi, Combination of Mammographic Texture Feature


Descriptors for Improved Breast Cancer Diagnosis. Asian Journal of
Information Technology, 2016

[11] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,


V.Vanhoucke, and A. Rabinovich, ‘‘Going deeper with convolutions,’’ in
Proc.IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 1–9.

[12] A. Wang, H. Wang, Y. Liu, M. Zhao, H. Zhang, Z. Lu, Y. Fang, X. Chen,


and G. Liu, ‘‘The prognostic value of PD–L1 expression for non-small cell lung
cancer patients: A meta-analysis,’’ Eur. J. Surgical Oncol., vol. 41, no. 4, pp.
450–456, Apr. 2015

[13] J. Lortet-Tieulent, I. Soerjomataram, J. Ferlay, M. Rutherford, E.


Weiderpass, and F. Bray, ‘‘International trends in lung cancer incidence by
histological subtype: Adenocarcinoma stabilizing in men but still increasing in
women,’’ Lung Cancer, vol. 84, no. 1, pp. 13–22, Apr. 2014.

[14] P. Naresh and R. Shettar, ‘‘Image processing and classifification


techniques for early detection of lung cancer for preventive health care: A
survey,’’ Int.J. Recent Trends Eng. Technol., vol. 11, no. 1, p. 595, 2014.

[15] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for


large-scale image recognition,’’ Comput. Sci., 2014.

[16] A Manikandarajan, S Sasikala, Detection and Segmentation of Lymph


Nodes for Lung Cancer Diagnosis. National Conference on System Design and
Information Processing – 2013.

46
[17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘Imagenet classifification
with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf. Process.
Syst., 2012, pp. 1097–1105.

[18] M.S. Al-Tarawneh, “Lung cancer detection using image processing


techniques,” Leonardo Electronic Journal of Practices and Technologies, vol.
20, pp. 147– 58, May 2012.

[19] LUNA16, “Lung tumor analysis 2016.” https://


luna16.grandchallenge.org/.

[20] M.S.Al-Tarawneh, “Lung cancer detection using image processing


techniques,” Leonardo Electronic Journal of Practices and Technologies, vol.
20, pp. 147– 58, May 2012

[21] A. Warth, T. Muley, M. Meister, A. Stenzinger, M. Thomas, P.


Schirmacher, P. A. Schnabel, J. Budczies, H. Hoffmann, and W. Weichert,
‘‘The novel histologic international association for the study of lung
cancer/American thoracic society/European respiratory society classifification
system of lung adenocarcinoma is a stage–independent predictor of survival,’’
J. Clin. Oncol., vol. 30, no. 13, pp. 1438–1446, May 2012.

[22] S. Kundu, R. Mitra, and S. Misra, ‘‘Squamous cell carcinoma lung with
progressive systemic sclerosis,’’ J. Assoc. Phys. India, vol. 60, no. 12, pp. 52–
54, 2012.

[23] D. Sharma and G. Jindal, ‘‘Computer aided diagnosis system for detection
of lung cancer in CT scan images,’’ Int. J. Comput. Elect. Eng., vol. 3, no. 5, pp.
714–718, Sep. 2011.

[24] R. L. Siegel, K. D. Miller, and A. Jemal, ‘‘Cancer statistics, 2015,’’ CA,


Cancer J. Clin., vol. 60, no. 5, pp. 277–300, 2010.

47
[25] X. D. Teng, ‘‘World Health Organization classifification of tumours,
pathology and genetics of tumours of the lung,’’ Zhonghua Bing LI Xue Za
Zhi/Chin. J.Pathol., vol. 34, no. 8, p. 544, 2005.

[26] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based learning


applied to document recognition,’’ Proc. IEEE, vol. 86, no. 11, pp. 2278–2324,
[27] Albert Chon, Peter Lu, NiranjanBalachandar “Deep Convolutional Neural
Networks for Lung Cancer Detection”.

48

You might also like