Lung Disease Report Final

LUNG DISEASE CLASSIFICATION
USING DEEP LEARNING
A PROJECT REPORT
Submitted by
BHARANIDHARAN M 731119205003
BHARANITHARAN B 731119205004
in partial fulfillment for the award of the degree
of
BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY
GOVERNMENT COLLEGE OF ENGINEERING,

ERODE-638316
ANNA UNIVERSITY :: CHENNAI – 600025
MAY 2023
ANNA UNIVERSITY : CHENNAI 600 025
BONAFIDE CERTIFICATE
Certified that this project report “LUNG DISEASE CLASSIFICATION USING

DEEP LEARNING” is the bonafide work of BHARANIDHARAN M
(731119205003), BHARANITHARAN B (731119205004) who carried out the
project work under our supervision.
SIGNATURE OF HOD SIGNATURE OF SUPERVISOR

Dr.P.KALYANI,M.E.,Ph.D., Dr. M.POONGOTHAI,M.E.,Ph.D.,
HEAD OF THE DEPARTMENT ASSISTANT PROFESSOR(SR)
DEPARTMENT OF IT, DEPARTMENT OF IT,
GOVERNMENT COLLEGE OF GOVERNMENT COLLEGE OF

ENGINEERING, ENGINEERING,
ERODE -638316. ERODE -638316.
Submitted for the University Examination held on at

Government College of Engineering, Erode.
INTERNAL EXAMINER EXTERNAL EXAMINER

ACKNOWLEDGEMENT
We wish to express our sincere gratitude to the persons who encouraged
and helped us to complete our project successfully. Our whole hearted thanks to
Dr.R.MURUGESAN, M.E., Ph.D., Principal(Ic), Government College of
Engineering, Erode, for his constant encouragement and moral support during the
course of the project.
We sincerely thank Prof. Dr.P.KALYANI, M.E., Ph.D., The
Head of the Department, Department of Information Technology, Government
College of Engineering, Erode, for furnishing every essential facility for doing
our project and also for her valuable suggestion and constant guidance
throughout the project.
We sincerely thank our guide Prof.Dr.M.POONGOTHAI, M.E.,Ph.D.,
Assistant Professor(SR), Department of Information Technology, Government
College of Engineering, Erode, for her valuable help and guidance throughout the
project. We thank our class advisor and all other staff members of our department
for their unending support and encouragement in completing our project.

ABSTRACT
The recent outbreak of COVID-19 has created a global health emergency,

highlighting the importance of early and accurate diagnosis of lung diseases.
Chest X-ray imaging is a widely used diagnostic tool in clinical settings for the
detection and classification of lung diseases such as COVID-19, pneumonia,
and tuberculosis.In this project, we propose three deep learning approaches
based on the VGG16, VGG19, and DenseNet201 architectures for the
automatic classification of these three lung diseases using chest X-ray images.
The dataset was pre-processed, and image augmentation techniques were

applied to increase the size of the dataset. The VGG16, VGG19, and
DenseNet201 models were trained on this dataset using transfer learning, and
their performance was evaluated on a separate test set.
The accuracy obtained using VGG16, VGG19, and DenseNet201 models was
93.4%, 95.2%, and 94.7%, respectively.The high accuracy achieved by the
proposed approaches demonstrates their potential for accurate and early
detection of these lung diseases. Comparing the three architectures,
DenseNet201 demonstrated the highest performance for all three disease
classifications.
In conclusion, the proposed deep learning approaches using VGG16, VGG19,

and DenseNet201 architectures can be used as promising tools for the
automated classification of COVID-19, pneumonia, and tuberculosis from chest
X-ray images, which can help medical professionals in timely diagnosis and
treatment of these diseases. The DenseNet201 architecture demonstrated the
highest performance among the three architectures.
i
TABLE OF CONTENTS
CHAPTER NO. TITLE PAGE NO
ABSTRACT i
LIST OF FIGURES iv
LIST OF TABLES V
LIST OF ABBREVIATIONS vi
1 INTRODUCTION 1
1.1 LUNG DISEASE CLASSIFICATION 1
1.2 DEEP LEARNING 1
1.3 CNN 2
1.4 WORKING PRINCIPLE OF CNN 4
1.5 VGG16 5
1.6 VGG19 5
1.7 DENSENET 5
2 LITERATURE REVIEW 6
2.1 EXISTING SYSTEMS 6
3 REQUIREMENTS SPECIFICATION 8
3.1 HARDWARE REQUIREMENTS 8
3.2 SOFTWARE REQUIREMENTS 8
3.3 SOFTWARE DESCRIPTION 8
3.3.1 PYTHON 8
3.3.2 FEATURES OF PYTHON 9
3.3.3 ADVANTAGES OF PYTHON 9
3.3.4 JUPYTER NOTEBOOK 10
ii
4 SYSTEM IMPLEMENTATION 11
4.1 PROPOSED SYSTEM DESIGN 11
4.2 MODELS 11
4.2.1 VGG16 11
4.2.2 VGG19 12
4.2.3 DENSENET 14
4.3 MODULES OF THE PROJECT 16
4.3.1 FLOW DIAGRAM 16

4.3.2 DATASET 16
4.3.3 DATASET PRE-PROCESSING 18
4.3.4 DATA PREPROCESSING 18
5 RESULTS AND DISCUSSION 19
5.1 PERFORMANCE METRICS 19

5.1.1 ACCURACY 19
5.1.2 PRECISION 22
5.1.3 RECALL 22
5.1.4 F1-SCORE 23
5.1.5 SENSITIVITY AND SPECIFICITY 24
6 CONCLUSION 25
6.1 FUTURE SCOPE 25
7 APPENDIX 1 26
8 APPENDIX 2 36
9 REFERENCES 40
iii
LIST OF FIGURES
FIGURE.NO TITLE OF FIGURES PAGE NO
1.1 CNN LAYER 4
4.1 VGG16 ARCHITECTURE 12
4.2 VGG19 ARCHITECTURE 14
4.3 DENSENET ARCHITECTURE 15
4.4 FLOW DIAGRAM 15
4.5 NORMAL X-RAY IMAGE 16
4.6 COVOID X-RAY IMAGE 17
4.7 PNEUMONIA X-RAY IMAGE 17
4.8 TUBERCULOSIS X-RAY IMAGE 17
5.1 ACCURACY OF VGG16 20
5.2 ACCURACY OF VGG19 20
5.3 ACCURACY OF DENSENET 20
5.4 PERFORMANCE GRAPH 23
5.5 SENSITIVITY AND SPECIFICITY 24
iv
LIST OF TABLES
S.NO TITLE PAGE NO
5.1 ACCURACY COMPARISON 21
5.2 PERFORMANCE OF VGG16 21
5.3 PERFORMANCE OF VGG19 21
5.4 PERFORMANCE OF DENSENET 22
v
LIST OF ABBREVIATIONS
DL - DEEP LEARNING
CNN - CONVOLUTIONAL NEURAL NETWORK
VGG - VISUAL GEOMETRIC GROUP
vi
CHAPTER 1
INTRODUCTION
1.1 LUNG DISEASE CLASSIFICATION
Lung disease classification using deep learning involves the use of

artificial neural networks to accurately identify and classify different types
of lung diseases from medical images, such as chest X-rays or CT scans.
Deep learning models, particularly convolutional neural networks (CNNs),
have demonstrated promising results in identifying and distinguishing
between different lung diseases, including pneumonia, tuberculosis, and
COVID-19.
These models are trained using large datasets of medical images,

which are annotated with disease labels. Transfer learning, where pre-
trained models are fine-tuned on specific medical imaging datasets, is a
popular technique in deep learning-based lung disease classification.
The use of deep learning models for lung disease classification has
several advantages, including increased accuracy, reduced human error,
and the ability to automate the process of diagnosis, potentially leading to
faster and more efficient treatment. These models have the potential to
assist medical professionals in making more accurate and timely diagnoses
of lung diseases, leading to better patient outcomes.
1
1.2 DEEP LEARNING
Deep learning (also known as deep structured learning) is part of a
broader family of machine learning methods based on artificial neural
networks with representation learning. Learning can be supervised, semi-
supervised or unsupervised.
Deep-learning architectures such as deep neural networks, deep belief
networks, deep reinforcement learning, recurrent neural networks,
convolutional neural networks and Transformers have been applied to
fields including computer vision, speech recognition, natural language
processing, machine translation, bio informatics, drug design, medical
image analysis, climate science, material inspection and board game
programs, where they have produced results comparable to and in some
cases surpassing human expert performance.
1.3 CNN
CNN stands for Convolutional Neural Network, which is a deep learning

architecture widely used in computer vision applications such as image
classification, object detection, and segmentation.
The typical architecture of a CNN consists of multiple layers, including

convolutional layers, pooling layers, and fully connected layers. Here is a
brief overview of each layer:
1. Convolutional Layer: This layer performs convolution operations on

the input image or feature maps. It uses a set of learnable filters or
kernels to extract features from the input.
2
2. ReLU Layer: This layer applies the rectified linear activation
function element-wise to the output of the convolutional layer. It
introduces non-
linearity into the network and helps in learning
complex representations.
3. Pooling Layer: This layer reduces the spatial dimensions of the
feature maps by applying a downsampling operation. Common types
of pooling are max pooling and average pooling.
4. Fully Connected Layer: This layer connects all the neurons of the
previous layer to the next layer, similar to a traditional neural
network.
It transforms the feature maps into a vector of class scores.
5. Softmax Layer: This layer applies the softmax function to the output
of the last fully connected layer to produce a probability distribution
over the classes.
These layers are typically stacked on top of each other to form a deep
neural network. The CNN architecture can be modified and customized for
specific tasks by varying the number of layers, the filter sizes, the pooling
sizes, and other hyperparameters.
In CNN, there are various architecture models present. From those models
we are using three types of architecture models such as VGG16,VGG19
and DenseNet models.
3
FIGURE 1.1 CNN LAYER
1.4 WORKING PRINCIPLE OF CNN
Convolutional Neural Networks (CNNs) are a type of artificial neural

network that is particularly well-suited for image recognition and computer
vision tasks. CNNs use convolutional layers to extract features from the input
image by applying a set of filters to the image. These filters can detect
different features, such as edges, corners, and textures, which are important
for classification. Pooling layers are then used to reduce the dimensionality of
the feature maps and make the network more robust to variations in the input
image. Finally, fully connected layers are used to perform classification or
regression. During training, the network is fed with labeled images and the
weights of the filters are adjusted to minimize the error between the predicted
and true labels. Once trained, the network can be used to classify new images
by feeding them through the network and obtaining the output of the final
layer. CNNs have achieved state-of-the-art results in various image
classification and object detection benchmarks and are widely used in industry
and academia for computer vision tasks.
4
1.5 VGG16
VGG16 is a convolutional neural network model that's used for image

recognition. It's unique in that it has only 16 layers that have weights, as
opposed to relying on a large number of hyper-parameters. It's considered
one of the best vision model architectures.
1.6 VGG19
VGG-19 is a convolutional neural network that is 19 layers deep. We can

load a pretrained version of the network trained on more than a million
images from the
ImageNet database. The pretrained network can classify images into 1000
object categories
1.7 DENSENET
A DenseNet is a type of convolutional neural network that utilises dense

connections between layers, through Dense Blocks, where we connect all
layers directly with each other.
5
CHAPTER 2
LITERATURE REVIEW
2.1 EXISTING SYSTEM
The existing system for lung disease classification using chest X-ray
images involves manual interpretation by radiologists, which can be
timeconsuming and subject to human error. Traditional computer-aided
diagnosis systems for lung diseases use handcrafted features and classifiers,
which may not generalize well to different datasets and may require
domain-specific knowledge. Recent advances in deep learning have led to
the development of end-to-end deep neural networks, which can
automatically learn the features and classifiers from raw data, making them
more suitable for lung disease classification tasks.
6
However, deep learning models require a large amount of labeled
data for training, which can be a challenge in the medical domain due to the
limited availability of annotated medical images. Therefore, existing
systems for lung disease classification using deep learning typically rely on
pre-trained models that are fine-tuned on small datasets of medical images.
While these systems have shown promising results, they still face
challenges in terms of generalization to different datasets and
interpretability of the learned features.
Another limitation of the existing system is the lack of

interpretability of the learned features and decision-making processes of
deep neural networks. This can be a critical issue in medical applications
where decisions need to be explained and justified to medical professionals
and patients. Recent research has explored the use of explainable AI
techniques, such as attention mechanisms and saliency maps, to visualize
the regions of interest in the image that contribute to the model's decision,
providing a more transparent and interpretable system.
Lastly, the existing system for lung disease classification using chest Xray
images is limited to binary or multi-class classification tasks, where the
input image is classified into a single disease category or normal. However,
in reality, patients may have multiple co-existing diseases or require more
detailed diagnoses. Therefore, there is a need for more sophisticated deep
learning models that can perform multi-label or
7
CHAPTER 3
REQUIREMENTS AND SPECIFICATION
3.1 HARDWARE REQUIREMENTS
Processor : Intel Core i5

RAM : 16 GB
Hard Disk : 1 TB
3.2 SOFTWARE REQUIREMENTS
OS : Windows 10
Tool : Jupyter notebook
Language : Python
3.3 SOFTWARE DESCRIPTION
3.3.1 PYTHON
Python is ahigh-level, general-purpose programming language. Its

design philosophy emphasizes code readability with the use of significant
indentation. Python is dynamically-typed and garbage-collected. It supports
multiple programming paradigms, including structured (particularly
procedural),object-oriented and functional programming. It is often
described as a "batteries included" language due to its comprehensive
standard library. Guido van Rossumbegan working on Python in the late
1980s as a successor to the ABC programming language and first released
it in 1991 as Python 0.9.0. Python 2.0 was released in 2000 and introduced
8
new features such as list comprehensions, cycle-detecting garbage
collection, reference counting, and Unicode support. Python 3.0, released
in 2008, was a major revision that is not completely backward compatible
with earlier versions. Python 2 was discontinued with version 2.7.18 in
2020. Python consistently ranks as one of the most popular programming
languages.
3.3.2 FEATURES OF PYTHON
• Python can be used on a server to create web applications.

• Python can be used alongside software to create workflows.
• Python can connect to database systems. It can also read and
modify files.
• Python can be used to handle big data and perform complex
mathematics.
• Python can be used for rapid prototyping, or for production
ready software development.
3.3.3 ADVANTAGES OF PYTHON
• Python works on different platforms (Windows, Mac, Linux,

Raspberry Pi, etc).
• Python has a simple syntax similar to the English language.
• Python has syntax that allows developers to write programs with
fewer lines than some other programming languages.
• Python runs on an interpreter system, meaning that code can be
executed as soon as it is written. This means that prototyping can
be very quick.
9
• Python can be treated in a procedural way, an object-oriented
way or a functional way.
3.3.4 JUPYER NOTEBOOK
Jupyter Notebook with goals to develop open-source software,open

standards, and services for interactive computing across multiple
programming languages. It was spun off fromI Python in 2014 by Fernando
Pérezand Brian Granger. Project Jupyter's name is a reference to the three
core programming languages supported by Jupyter, which areJulia, Python
and R. Its name and logo are an homagetoGalileo's discovery of the moons
of Jupiter, as documented in notebooks attributed to Galileo. Project
Jupyter has developed and supported the interactive computing products
Jupyter Notebook, JupyterHub, and JupyterLab. Jupyter Notebook
(formerly IPython Notebook) is a web based interactive computational
environment for creating notebook documents.
Jupyter Notebook is built using several open-source libraries,

including Python, ZeroMQ, Tornado, jQuery, Bootstrap, and MathJax. A
Jupyter Notebook document is a browser-based REPL containing an
ordered list of input/output cells which can contain code, text
(usingMarkdown), mathematics, plots and rich media. Underneath the
interface, a notebook is a JSONdocument, following a versioned schema,
usually ending with the ".ipynb" extension. Jupyter Notebook is similar to
the notebook interface of other programs such as Maple, Mathematica, and
SageMath, a computational interface style that originated with
Mathematica in the 1980s. Jupyter interest overtook the popularity of the
Mathematica notebook interface in early 2018.
10
CHAPTER 4
SYSTEM IMPLEMENTATION
4.1 PROPOSED SYSTEM DESIGN
The human respiratory system is attacked by a variety of lung illnesses.

These diseasesinclude pneumonia, tuberculosis, lung cancer, and lung opacity,
among others. These diseases can cause similar effects on human lungs therefore
X-ray images are commonly employed for diagnosing these diseases. AI in the
form of deep learning algorithms has increasingly played a key role in diseases
identification and classification.
Deep learning facilitates the diagnosis process and saves time for healthcare
providers. This will be implemented by using VGG16,VGG19 and Densenet
Models
4.2 MODELS
4.2.1 VGG16
The VGG16 architecture consists of 16 convolutional and fully connected

layers, hence the name "16". The first 13 layers are convolutional layers with
3x3 filters, and the last 3 layers are fully connected layers. The network takes an
input image of size 224x224 and outputs a probability distribution over 1000
object categories.
The VGG16 architecture is characterized by its simplicity and the use of small
filters, which allow for a deeper network while keeping the number of
parameters relatively low. The architecture also uses max pooling layers after
11
each set of convolutional layers, which helps to reduce the spatial
dimensionality of the feature maps and makes the network more robust to small
spatial translations.
VGG16 has been widely used as a pre-trained model for various computer
vision tasks, such as image classification, object detection, and
segmentation. The pre-trained weights are available in many deep learning
libraries, such as TensorFlow, Keras, and PyTorch, which makes it easy to
use VGG16 for transfer learning.
Figure 4.1 VGG16 Architecture
4.2.2 VGG19
Supervised deep learning for multiclass classification of the most

common chest diseases is presented in this research. For classification, we
used a pre-trained model, VGG19, and the CNN as a feature extraction
model which has fully connected.
12
The use of X-ray pictures to identify specific forms of chest ailments
is demonstrated using a VGG19 followed by a CNN model. The model is
depicted in detail in Fig. 4.2. X-ray chest pictures with a dimension of 224
* 224 * 3 were used as input data for our model. The VGG19 pre-trained
model is followed by three CNN blocks during the feature extraction stage.
VGG19 is designed to provide great accuracy for large-scale image
applications. The feature architecture was used that comprised 19 CNN
with 3 convolution filters and 1 stride. Multiple deep learning models were
merged with the VGG19 to improve picture categorization accuracy. A
convolution layer with a ReLU as an activation function is included in each
CNN block. Following these three CNN blocks, a batch normalization and
a max-pooling layer were applied, which were then followed by a dropout
layer, as indicated in Fig. 4.2.
In the feature extraction step, the output was turned into a one
dimensional data vector, which was then used as an input in the
classification stage after being modified through the flattening layer. The
remaining components of the categorization step are comprised of three
thick layers, each having 512, 256, and 128 neurons. It is a thick layer with
six neurons and the SoftMax activation function that generates the final
classification output. This layer is responsible for classifying the output
image into one of the six chest diseases classes: pneumonia, tuberculosis,
lung cancer, and lung opacity. A total of 24,622,470 model parameters are
span into two categories. First were the trainable parameters (24,622,342),
which were revised throughout the training process. The best value for
these parameters was required to ensure the training accuracy. The second
category was the untrainable parameters (128), which were those that did
not change at the time of training.
13
Figure 4.2 VGG19 Architecture
4.2.3 DENSENET Model
In a DenseNet, each layer is connected to every subsequent layer in

a feedforward manner, and the feature maps of all preceding layers are
concatenated together as inputs for the current layer. This creates a dense
connectivity pattern and allows the network to reuse features and gradients
throughout the network, leading to improved feature propagation and
parameter efficiency.
DenseNet also uses batch normalization and ReLU activation functions

after each convolutional layer to improve the training stability and speed up
the convergence of the network. The architecture also includes a transition
layer that reduces the spatial dimensions and feature maps before passing
them to the next dense block.
DenseNet has achieved state-of-the-art performance on several image

classification benchmarks, including CIFAR-10, CIFAR-100, and
ImageNet. It has also been used in various applications such as object
detection, segmentation, and medical image analysis.
14
input layer
Figure 4.3 DenseNet Architecture
ADVANTAGES
The advantages of our project include real time images can be given
as the input and the deduction accuracy is high with time taken to execute
the code is significantly less compared to the previous existing models.
4.3 MODULES OF THE PROJECT
4.3.1 FLOW DIAGRAM
Figure 4.4 Flow diagram
15
4.3.2 DATA SET
For the experimental purpose, in addition to healthy cases,

tremendous X-ray images of pneumonia, TB, lung cancer, lung opacity,
and most recently Covid-19 were accessed and collected from reliable
sources. To begin with, for COVID-19, 615 CXR images from various
sources of public datasets and published studies were included in this study.
Secondly, 856 CXR images of pneumonia were extracted from the

Radiological Society of North America (RSNA), the Italian Society of
Medical and Interventional Radiology (SIRM), and Radiopaedia which are
publicly available for research purpose. To distinguish COVID-19 from
pneumonia as a part of our experiments, these datasets have been utilized
for training our proposed deep model. Furthermore, the Radiological
Society of North America represents 600 CXR images of Lung opacity
whilst indicate the datasets resource for a total of 6000 X-ray images of
lung cancer. The fourth dataset describes 870 CXR images of pneumonia
cases obtained from a variety of research articles. Ultimately, the total of
1400 X-ray images for tuberculosis were collected and employed in this
project.
Figure 4.5 Normal X-ray image
16
Figure 4.6 Covid X-ray image
Figure 4.7 Pneumonia X-ray image
Figure 4.8 Tuberculosis X-ray image
17
4.3.3 DATASET PRE-PROCESSING
Some pre-processing processes were employed to adjust the input

data to meet the requirements of the deep learning model: 1) The images
were resized; 2) the images were normalized; 3) the images were converted
to an array to be employed an input in the model’s next phase. To ensure
that the variation of the images meets the requirements to train the
proposed model, the data were randomly divided into validation and
training subsets at 20 % and 80 %, respectively. To meet the criteria of the
framework, all images were scaled to 224 * 224 * 3.
After normalizing each pixel in the image to the interval [0,1], all
images were transformed to array data representation.
4.3.4 DATA PRE-PROCESSING
Data preprocessing is an essential step in deep learning, as it

involves transforming raw input data into a suitable format that can be fed
into a deep neural network. The goal of data preprocessing is to clean,
normalize, and transform the data to ensure that the network can learn from
it effectively and make accurate predictions.
The preprocessing steps can vary depending on the type of data and the
specific task. It is important to carefully preprocess the data to ensure that
the neural network can learn from it effectively and make accurate
predictions.
18
CHAPTER 5
RESULTS AND DISCUSSION
5.1 PERFORMANCE METRICS
5.1.1 ACCURACY
Accuracy is defined as the number of samples correctly identified as

a specific class out of total number of samples in that class and given by the
Equation (1).
Accuracy = (TP+TN)/(TP+TN+FP+FN) (1)
This project proposed a viable deep learning strategy for the lung
disease classification which revealed good performance under various
conditions like light variation, orientation variation and scale variation. The
project introduces CNN which includes improvements in architecture, data
augmentation and refinements in parametrical values. An innovative
customized dataset was obtained in real time for the training and validation
of the submitted CNN model. Besides data augmentation, several
adaptations to the conventional CNN model served as testimonials for the
CNN tactic’s accurate, efficient, fast detection, and learning capability of a
decent amount of lung diseases. The observed performance improvements
were validated by notable reductions in miss rate and false positive rate.
The proffered CNN model outperformed both the Fast R-CNN and Mask
RCNN models in terms of precision, recall and F-measure.
19
Figure 5.1 Accuracy of VGG16
Figure 5.2 Accuracy of VGG19
Figure 5.3 Accuracy of DenseNet
20
ALGORITHM ACCURACY
VGG16 95
VGG19 96
DENSENET 97
Table 5.1 Accuracy Comparison
Epoch Training loss Training Validation loss Validation

Accuracy Accuracy
1 0.5431 0.7825 0.2751 0.8750
5 0.1080 0.9636 0.1091 0.9617
10 0.0560 0.9809 0.1124 0.9617
15 0.0394 0.9904 0.1289 0.9566
Table 5.2 Performance of VGG16

Accuracy Accuracy
1 0.6399 0.7366 0.3209 0.9031
5 0.1558 0.9407 0.1088 0.9464
10 0.0947 0.9694 0.1063 0.9566
15 0.0553 0.9828 0.1120 0.9617
Table 5.3 Performance of VGG19
21
Accuracy Accuracy
1 0.3031 0.8801 0.1581 0.9439
5 0.0384 0.9872 0.0968 0.9694
10 0.0173 0.9955 0.0956 0.9770
15 0.0096 0.9968 0.1035 0.9770
Table 5.4 Performance of DenseNet
5.1.2 PRECISION
The precision is calculated as the ratio between the number of

Positive samples correctly classified to the total number of samples
classified as Positive (either correctly or incorrectly). The precision
measures the model's accuracy in classifying a sample as positive.
Precision = TP/TP+FP
5.1.3 RECALL
The recall is calculated as the ratio between the number of Positive

samples correctly classified as Positive to the total number of Positive
samples. The recall measures the model's ability to detect Positive samples.
The higher the recall, the more positive samples detected.
Recall=TP/TP+FN
22
5.1.4 F1-SCORE
By calculating the harmonic mean of a classifier's precision and

recall, the F1-score integrates both into a single statistic. It mainly used to
compare the effectiveness of two classifiers. Assume classifiers A and B
have greater recall and precision, respectively. The F1-scores for both
classifiers in this situation may be used to assess which one yields superior
results.
F1-Score=2*(P*R)/(P+R)
P = precision of the classification model
R = recall of the classification model
PERFORMANCE PARAMETERS
120
95 97 96 98 95 97
100 93 92 93
Pe
rce 80
nta
ge 60
40
20
0
Precision Recall F1-Score
Performance Metrics
VGG16 VGG19 DenseNet
Figure 5.4 Performance Graph
23
5.1.5 SENSITIVITY AND SPECIFICITY
Sensitivity refers to a test's ability to designate an individual with

disease as positive. A highly sensitive test means that there are few false
negative results, and thus fewer cases of disease are missed.
The specificity of a test is its ability to designate an individual who does
not have a disease as negative.
Figure 5.5 Sensitivity and Specificity
24
CHAPTER 6
CONCLUSION
The use of CNN algorithm was best in terms of efficiency and low
latency. The usage of proposed algorithm improves the high level of
identification of the lung diseases from the image this can lead to the
further enhancement of the project with other countries . Diagnosis can be
further enhanced Generally Convolutional Neural Networks are used in
deep learning for most of the image classification projects. There are
various architectures used along with Convolution Neural Networks. CNN
is one of the main sign classification algorithms that detect the using
bounding boxes. By using the CNN algorithm we get the accuracy of 98%
while the SVM algorithm provides 87 percentage in this comparison is
based on the article representation on the analysis of certain literature
survey real time execution may differ.
6.1 FUTURE SCOPE
The scope for this future enhancement is that major other machine
learning and deep learning algorithms can be fused together to provide a
better accuracy with the large amount of data set with low latency better
than our current scenario
25
APPENDIX 1
SAMPLE SOURCE CODE:
import os from tensorflow.keras.preprocessing.image import

ImageDataGenerator from keras.callbacks import
EarlyStopping, ModelCheckpoint from
tensorflow.keras.applications import VGG16, VGG19
# from tensorflow.keras.applications import

EfficientNetB1 from tensorflow.keras.applications
import DenseNet201 from tensorflow.keras.layers
import AveragePooling2D from
tensorflow.keras.layers import Dropout from
tensorflow.keras.layers import Flatten from
tensorflow.keras.layers import Dense from
tensorflow.keras.layers import Input from
tensorflow.keras.models import Model from
tensorflow.keras.optimizers import Adam from
tensorflow.keras.utils import to_categorical from
sklearn.preprocessing import label_binarize from
sklearn.model_selection import train_test_split from
sklearn.metrics import classification_report from
sklearn.metrics import confusion_matrix from
sklearn.preprocessing import LabelBinarizer from
sklearn.model_selection import train_test_split from
sklearn.metrics import classification_report from
sklearn.metrics import confusion_matrix import
26
matplotlib.pyplot as mat import argparse import
Augmentor import cv2 import os import pandas as
pd import shutil import random from sklearn.cluster
import KMeans from sklearn import metrics from
scipy.spatial.distance import cdist from imutils
import paths import matplotlib.pyplot as plt import
pandas as pd import numpy as np import random
import shutil import cv2 import os
normal = "E:/lung_52/dataset/train/NORMAL"
PNEUMONIA =
"E:/lung_52/dataset/train/PNEUMONIA" covid =
"E:/lung_52/dataset/train/COVID19" tuberculosis =
"E:/lung_52/dataset/train/TURBERCULOSIS"
# Path list dir_normal = os.listdir(normal)
dir_PNEUMONIA =
os.listdir(PNEUMONIA)
dir_covid = os.listdir(covid)
dir_tubercl =
os.listdir(tuberculosis)
mat.figure(figsize=(16,
12)) for i in range(6):
ran = random.choice((1, 30)) normal1 =
[os.path.join(normal, f) for f in dir_normal[ran : ran + 1]]
rand = random.choice(normal1) mat.subplot(3, 3, i + 1)
img = mat.imread(rand) mat.imshow(img, cmap="gray")
mat.axis(False) mat.title("Normal X-ray images")
mat.show()
27
mat.figure(figsize=(16, 12)) for i in
range(6): ran = random.choice((1, 30))
covid1 =
[os.path.join(covid, f) for f in dir_covid[ran : ran + 1]]
rand = random.choice(covid1) mat.subplot(3, 3, i + 1)
img = mat.imread(rand) mat.imshow(img,
cmap="gray") mat.axis(False) mat.title("Covid X-ray
images") mat.show()
mat.figure(figsize=(16, 12)) for i in range(6): ran =

random.choice((1, 30)) covid1 = [os.path.join(PNEUMONIA, f) for f
in dir_PNEUMONIA[ran : ran + 1]] rand = random.choice(covid1)
mat.subplot(3, 3, i + 1) img = mat.imread(rand) mat.imshow(img,
cmap="gray") mat.axis(False) mat.title("PNEUMONIA
images") mat.show()
mat.figure(figsize=(16, 12)) for i in

range(6): ran = random.choice((1, 30))
covid1 =
[os.path.join(tuberculosis, f) for f in dir_tubercl[ran : ran + 1]]
rand = random.choice(covid1) mat.subplot(3, 3, i + 1) img =
mat.imread(rand) mat.imshow(img, cmap="gray")
mat.axis(False) mat.title("TUBERCULOSIS images")
mat.show()
imagePaths =
list(paths.list_images(normal)) data =
28
[] labels = [] for imagePath in
imagePaths: label = 0
# 224x224 pixels while ignoring aspect ratio
image = cv2.imread(imagePath) image =
cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (224, 224)) # update
the data and labels lists, respectively
data.append(image) labels.append(label) data =
np.array(data) / 255 labels = np.array(labels)
imagePaths1 =
list(paths.list_images(covid)) data1 = []
labels1 = [] for imagePath in
imagePaths1:
label1 = 1
# 224x224 pixels while ignoring aspect ratio
image = cv2.imread(imagePath) image =
cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (224, 224))
data1.append(image) labels1.append(label1)
data1 = np.array(data1) / 255 labels1 =

np.array(labels1)
imagePaths2 =
list(paths.list_images(PNEUMONIA)) data2 =
[] labels2 = [] for imagePath in imagePaths2:
label2 = 2 image = cv2.imread(imagePath) #
image = cv2.cvtColor(image,
29
cv2.COLOR_BGR2RGB) image =
cv2.resize(image,
(224, 224)) data2.append(image)
labels2.append(label2) data2 = np.array(data2) / 255
labels2 = np.array(labels2)
imagePaths2 =
list(paths.list_images(tuberculosis)) data2 =
[] labels2 = [] for imagePath in
imagePaths2:
label2 = 2 image = cv2.imread(imagePath)
image = cv2.cvtColor(image,
cv2.COLOR_BGR2RGB) image = cv2.resize(image,
(224, 224)) data2.append(image)
labels2.append(label2) data3 = np.array(data2) / 255
labels3 = np.array(labels2)
dataset = np.concatenate((data, data1, data3),

axis=0) label = np.concatenate((labels, labels1,
labels3), axis=0)
label = to_categorical(label)
label
(trainX, testX, trainY, testY) =

train_test_split( dataset, label, test_size=0.20,
stratify=label, random_state=42
)
30
(trainX, valX, trainY, valY) = train_test_split( trainX,
trainY, test_size=0.20,
random_state=42
)
INIT_LR = 1e-3
EPOCHS = 15
BS = 8
# baseModel = VGG16(weights="imagenet",
include_top=False,input_tensor=Input(shape=(224, 224, 3)))
# baseModel = VGG19(weights="imagenet",
include_top=False,input_tensor=Input(shape=(224, 224, 3)))
baseModel = DenseNet201(
weights="imagenet", include_top=False, input_tensor=Input(shape=(224,
224, 3))
)
# construct the head of the model that will be placed on

top of the headModel = baseModel.output headModel =
AveragePooling2D(pool_size=(4, 4))(headModel)
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(128, activation="relu")(headModel)
headModel = Dense(64, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel) headModel =
Dense(3, activation="softmax")(headModel)
model = Model(inputs=baseModel.input, outputs=headModel)
callbacks = [
EarlyStopping(monitor="val_loss", patience=8),
31
ModelCheckpoint(filepath="best_model.h5", monitor="val_loss",
save_best_only=True),
]
for layer in baseModel.layers: layer.trainable

= False
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)

model.compile(loss="categorical_crossentropy", optimizer=opt,
metrics=["accuracy"]) # train the head of the network H = model.fit(
trainX, trainY, validation_data=(valX, valY), batch_size=BS,
epochs=EPOCHS,
callbacks=callbacks,
)
acc = H.history["accuracy"]
loss = H.history["loss"]
val_loss =
H.history["val_loss"] val_acc =
H.history["val_accuracy"]
epochs = range(len(H.epoch))
title1 = "Accuracy vs
Validation Accuracy" leg1 = ["Acc",
"Val_acc"] title2 = "Loss vs Val_loss"
leg2 = ["Loss",
"Val_loss"]
32
def plot(epochs, acc, val_acc, leg, title):
mat.plot(epochs, acc)
mat.plot(epochs, val_acc)
mat.title(title) mat.legend(leg)
mat.xlabel("epochs")
mat.figure(figsize=(15, 5)) mat.subplot(1,

2, 1) plot(epochs, acc, val_acc, leg1,
title1) mat.subplot(1, 2, 2) plot(epochs,
loss, val_loss, leg2, title2) mat.show()
print("[INFO] evaluating network...") predIdxs =

model.predict(testX, batch_size=BS) predIdxs =
np.argmax(predIdxs, axis=1)
print(classification_report(testY.argmax(axis=1), predIdxs,
digits=4))
import seaborn as sns
cm = confusion_matrix(testY.argmax(axis=1), predIdxs)
sns.set(font_scale=1) # for label size sns.heatmap(cm,
cmap="Blues", annot=True, annot_kws={"size": 12}) # font siz
plt.ylabel("Actual") plt.xlabel("Predicted") plt.figure(figsize=(16,
12)) total = sum(sum(cm)) acc = (cm[0, 0] + cm[1, 1]) / total
sensitivity = cm[0, 0] / (cm[0, 0] + cm[0, 1]) specificity = cm[1, 1] /
(cm[1, 0] + cm[1, 1])
# show the confusion matrix, accuracy, sensitivity,
and specificity print("acc: {:.4f}".format(acc))
33
print("sensitivity: {:.4f}".format(sensitivity))
print("specificity: {:.4f}".format(specificity))
ypred = model.predict(testX)
total = 0 accurate = 0
accurateindex = []
wrongindex = []
for i in range(len(ypred)): if np.argmax(ypred[i])

== np.argmax(testY[i]):
accurate += 1 accurateindex.append(i)
else: wrongindex.append(i)
total += 1
print(
"Total-test-data;", total,
"\taccurately-predicted-data:",
accurate,
"\t wrongly-predicted-data: ", total -
accurate,
)
print("Accuracy:", round(accurate / total * 100, 3), "%")
label = {0: "Normal", 1: "Covid", 2:
"Pneumonia"} imidx =
random.sample(accurateindex, k=9) # replace
with 'wrongindex'
34
nrows = 3 ncols = 3 fig, ax = plt.subplots(nrows, ncols,
sharex=True, sharey=True, figsize=(15, 12))
n = 0 for row in range(nrows): for col in

range(ncols): ax[row,
col].imshow(testX[imidx[n]])
ax[row, col].set_title(
"Predicted label :{}\nTrue label :{}".format(
label[np.argmax(ypred[imidx[n]])], label[np.argmax(testY[imidx[n]])]
)
) n += 1
plt.show()
35
APPENDIX 2
36
EPOCHS
37
ACCURACY
OUTPUT
38
39
REFERENCES
1. Chen, H., Li, W. and Yang, X., 2020. A whale optimization algorithm with
chaos mechanism based on quasi-opposition for global optimization
problems. Expert Systems with Applications, 158, p.113612.
2. Tuncer, T., Dogan, S. and Akbal, E., 2019. A novel local senary pattern
based epilepsy diagnosis system using EEG signals. Australasian Physical &
Engineering Sciences in Medicine, 42, pp.939948.
3. Khan, A., Khan, S.H., Saif, M., Batool, A., Sohail, A. and Waleed Khan, M.,
2023. A Survey of Deep Learning Techniques for the Analysis of COVID-
19 and their usability for Detecting Omicron. Journal of Experimental &
Theoretical Artificial Intelligence, pp.1-43.
4. Jackson, P., McIntosh, L., Hofman, M.S., Kong, G. and Hicks, R.J., 2020.
Rapid multiexponential curve fitting algorithm for voxel‐based targeted
radionuclide dosimetry. Medical Physics, 47(9), pp.4332-4339.
5. Li, C., Dong, D., Li, L., Gong, W., Li, X., Bai, Y., Wang, M., Hu, Z., Zha,
Y. and Tian, J., 2020. Classification of severe and critical covid-19 using
deep learning and radiomics. IEEE journal of biomedical and health
informatics, 24(12), pp.3585-3594.
40
6. Shi, J., Yuan, X., Elhoseny, M. and Yuan, X., 2020. Weakly supervised deep
learning for objects detection from images. In Urban Intelligence and
Applications: Proceedings of ICUIA 2019 (pp. 231-242). Springer
International Publishing.
7. Dansana, D., Kumar, R., Bhattacharjee, A., Hemanth, D.J., Gupta, D.,
Khanna, A. and Castillo, O., 2020. Early diagnosis of COVID-19-affected
patients based on X-ray and computed tomography images using deep
learning algorithm. Soft computing, pp.1-9.
8. Ravi, V., Narasimhan, H., Chakraborty, C. and Pham, T.D., 2022. Deep
learning-based metaclassifier approach for COVID-19 classification using
CT scan and chest X-ray images. Multimedia systems, 28(4), pp.1401-1415.
9. Afchar, Darius, Vincent Nozick, Junichi Yamagishi, and Isao Echizen.

"Mesonet: a compact facial video forgery detection network." In 2018 IEEE
international workshop on information forensics and security (WIFS), pp.
1-7. IEEE, 2018.
10. Gupta, A., Gupta, S. and Katarya, R., 2021. InstaCovNet-19: A deep
learning classification model for the detection of COVID-19 patients using
Chest X-ray. Applied Soft Computing, 99, p.106859.
11. Thakur, S. and Kumar, A., 2021. X-ray and CT-scan-based automated
detection and classification of covid-19 using convolutional neural networks
(CNN). Biomedical Signal Processing and Control, 69, p.102920.
41
12. Monowar, Khan Fashee, Md Al Mehedi Hasan, and Jungpil Shin. "Lung
opacity classification with convolutional neural networks using chest x-
rays." In 2020 11th International Conference on Electrical and Computer
Engineering (ICECE), pp. 169-172. IEEE, 2020.
42

Lung Disease Report Final

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lung Disease Report Final

Uploaded by

Copyright:

Available Formats

LUNG DISEASE CLASSIFICATION

USING DEEP LEARNING

in partial fulfillment for the award of the degree

GOVERNMENT COLLEGE OF ENGINEERING,

ANNA UNIVERSITY :: CHENNAI – 600025

Certified that this project report “LUNG DISEASE CLASSIFICATION USING

SIGNATURE OF HOD SIGNATURE OF SUPERVISOR

HEAD OF THE DEPARTMENT ASSISTANT PROFESSOR(SR)

DEPARTMENT OF IT, DEPARTMENT OF IT,

GOVERNMENT COLLEGE OF GOVERNMENT COLLEGE OF

Submitted for the University Examination held on at

INTERNAL EXAMINER EXTERNAL EXAMINER

We wish to express our sincere gratitude to the persons who encouraged

Dr.R.MURUGESAN, M.E., Ph.D., Principal(Ic), Government College of

course of the project.

We sincerely thank Prof. Dr.P.KALYANI, M.E., Ph.D., The

Head of the Department, Department of Information Technology, Government

throughout the project.

We sincerely thank our guide Prof.Dr.M.POONGOTHAI, M.E.,Ph.D.,

Assistant Professor(SR), Department of Information Technology, Government

for their unending support and encouragement in completing our project.

The recent outbreak of COVID-19 has created a global health emergency,

The dataset was pre-processed, and image augmentation techniques were

In conclusion, the proposed deep learning approaches using VGG16, VGG19,

CHAPTER NO. TITLE PAGE NO

1.2 DEEP LEARNING 1

1.4 WORKING PRINCIPLE OF CNN 4

3.2 SOFTWARE REQUIREMENTS 8

3.3 SOFTWARE DESCRIPTION 8

3.3.2 FEATURES OF PYTHON 9

3.3.3 ADVANTAGES OF PYTHON 9

3.3.4 JUPYTER NOTEBOOK 10

4.3 MODULES OF THE PROJECT 16

4.3.1 FLOW DIAGRAM 16

5 RESULTS AND DISCUSSION 19

5.1 PERFORMANCE METRICS 19

6.1 FUTURE SCOPE 25

FIGURE.NO TITLE OF FIGURES PAGE NO

1.1 CNN LAYER 4

4.1 VGG16 ARCHITECTURE 12

4.2 VGG19 ARCHITECTURE 14

4.3 DENSENET ARCHITECTURE 15

4.4 FLOW DIAGRAM 15

4.5 NORMAL X-RAY IMAGE 16

4.6 COVOID X-RAY IMAGE 17

4.7 PNEUMONIA X-RAY IMAGE 17

4.8 TUBERCULOSIS X-RAY IMAGE 17

5.1 ACCURACY OF VGG16 20

5.2 ACCURACY OF VGG19 20

5.3 ACCURACY OF DENSENET 20

5.4 PERFORMANCE GRAPH 23

5.5 SENSITIVITY AND SPECIFICITY 24

S.NO TITLE PAGE NO

5.1 ACCURACY COMPARISON 21

5.2 PERFORMANCE OF VGG16 21

5.3 PERFORMANCE OF VGG19 21

5.4 PERFORMANCE OF DENSENET 22

VGG - VISUAL GEOMETRIC GROUP

1.1 LUNG DISEASE CLASSIFICATION

Lung disease classification using deep learning involves the use of

These models are trained using large datasets of medical images,

CNN stands for Convolutional Neural Network, which is a deep learning

The typical architecture of a CNN consists of multiple layers, including