You are on page 1of 22

CovidX:COVID-19Detection Using Chest X-Ray

BACHELOR OF TECHNOLOGY
In
Computer Science and Engineering
By

Sakshi Prajapati (2018022019)


Rashi Verma (2017021085)
Ramakant (2017021084)

Under the Guidance of


Prof. M.K. Srivastava
Assistant Professor

Department of Computer Science and Engineering

MADAN MOHAN MALAVIYA UNIVERSITY OF


TECHNOLOGY
Gorakhpur (U.P.) – INDIA
TABLE OF CONTENTS

Certificate............................................................................................................................. 1

Candidate’s Declaration........................................................................................................ 2

Approval Sheet..................................................................................................................... 3

Acknowledgement................................................................................................................ 4

Abstract................................................................................................................................. 5

CHAPTER 1 INTRODUCTION.....................................................................................6

CHAPTER 2 LITERATURE REVIEW........................................................................ 7

CHAPTER 3 STATEMENT OF PROBLEM....................................................................10

CHAPTER 4 SOLUTION APPROACH............................................................................11

CHAPTERS 5 WORK PROGRESS..............................................................................14

CHAPTER 6 CONCLUSION............................................................................................. 15
REFERENCES...............................................................................................................15
CERTIFICATE

It is certified that Sakshi Prajapati ,Rashi Verma and Ramakant have carried out the project work
presented in this report entitled “CovidX:COVID-19 Detection Using chest X-Ray” for the award of
Bachelor of Technology in Computer Science and Engineering from Madan Mohan Malaviya
University of Technology (formerly Madan Mohan Malaviya Engineering College), Gorakhpur
(UP) under my supervision and guidance. The report embodies result of original work and study
carried out by students themselves and the contents of the report do not form the basis for the award
of any other degree to the candidate or to anybody.

Prof. M.K. Srivastava


Assistant Professor
Computer Science and Engineering Department
Madan Mohan Malaviya University of
Technology, Gorakhpur.
CANDIDATE’S DECLARATION

We declare that this written submission represents our work and ideas in our own words and
where others ideas or words have been included; we have adequately cited and referenced the
original sources. We also declare that we have adhered to all principles of academic honesty and
integrity and have not misrepresented or falsified any idea/data/fact/source in our submission.
We understand that any violation of the above will be cause for disciplinary action by the
University and can also evoke penal action from the sources which have thus not been properly
cited or from whom proper permission has not been taken when needed.

Sakshi Prajapati
(Roll No. 201802201 )

Rashi Verma
(Roll No. 2017021085)

Ramakant
(Roll No. 2017021084)

B. Tech (CSE)
Department of Computer Science and Engineering
APPROVAL SHEET

This project report entitled“CovidX:COVID-19 Detection Using chest X-Ray”by Sakshi


Prajapati, Rashi Verma and Ramakant is approved for the degree of Bachelor of
Technology in Computer Science and Engineering.

Examiner

Supervisor

Prof. M.K. Srivastava

Head of Department
Dr. P. K. Singh

Dean (UG)
Dr. D. K. Dwivedi

Date:

Place:
ACKNOWLEDGEMENT
It is matter of great pleasure and satisfaction for me to present this dissertation work entitled
“CovidX:COVID-19 Detection Using chest X-Ray”, as a part of curriculum for award of “
Bachelor of Technology” from Madan Mohan Malaviya University of Technology,
Gorakhpur (U.P.) India.
I am very grateful to my Head of the Department, Dr. P. K. Singh. It has been truly reassuring to
know that he is always willing to share his quest for new problem and new solutions forms a very
challenging and rewarding environment with us. He provides all kind of academic as well as
administrative support for smooth completion of my dissertation work.
I am very much thankful to my supervisor, Prof. Muzammil Hasan also to encourage me to
perform work in emerging area of research i.e. CovidX: COVID-19 Detection Using chest X-
Ray as well as her continuous guidance and support throughout my work. I would also like to
thank all my classmates for their valuable suggestions and helpful discussions.
At last, I am grateful to my family members, especially my beloved parents, for their
encouragement and tender. Without them, I would not have been able to gather enough strength
to finish this dissertation.

Date: 22/02/2021
ABSTRACT
The exponential increase in COVID-19 patients is disastrous for healthcare systems across the
World.With limited testing kits, it is impossible for every patient with respiratory illness to be
tested using conventional techniques (RT-PCR). The tests also have long turn-around time, and
limited sensitivity. Detecting possible COVID-19 infections on Chest X-Ray may help
quarantine high risk patients while test results are awaited. X-Ray machines are already available
in most healthcare systems, and with most modern X-Ray systems already digitized, there is no
transportation time involved for the samples either. In this work we propose the use of chest X-
Ray to prioritize the selection of patients for further RT-PCR testing. This may be useful in an
inpatient setting where the present systems are struggling to decide whether to keep the patient in
the ward along with other patients or isolate them in COVID-19 areas. It would also help in
identifying patients with high likelihood of COVID with a false negative RT-PCR who would
need repeat testing. Further, we propose the use of modern AI techniques to detect the COVID-
19 patients using X-Ray images in an automated manner, particularly in settings where
radiologists are not available, and help make the pro-posed testing technology scalable. We
present CovidX: COVID-19 AI Detector, a novel deep neural network based model to triage
patients for appropriate testing. On the publicly available covid-chestxray-dataset dataset, our
model gives 97.5% accuracy with 100% sensitivity (recall) for the COVID-19 infection.
Chapter:1 Introduction and
Background

The COVID-19 pandemic has been causing devastating impacts on the well-being of
people around the world as well as the global economy In India, COVID-19 has infected
more than 7.24M people and caused more than 100k deaths. One effective method to
combat COVID-19 is to increase the testing capacity. However, this is not possible in
some places such as India due to the lack of testing kits and health facilities. Motivated
by the effort of the open source community on collecting the COVID-19 dataset of x-
rays and the success of Deep Learning on previous studies with chest radiography, this
thesis builds a Convolutional Neural Network in order to detect COVID-19 using only
chest X-Ray images.
The use of X-Ray has several advantages over conventional diagnostic tests:

1. X-Ray imaging is much more widespread and cost effective than the
conventional diagnostic tests.

2. Transfer of digital X-Ray images does not require any transportation


from point of acquisition to the point of analysis, thus making the
diagnostic pro- cess extremely quick.

3. Unlike CT Scans, portable X-Ray machines also enable testing within


an isolation ward itself, hence reducing the requirement of additional
Personal Protective Equipment (PPE), an extremely scarce and
valuable resource in this scenario. It also reduces the risk of hospital
acquired infection for the patients.
This thesis is organized as follows:
Section 1, is quick introduction,
Section 2, briefly summarizes the history of Deep Learning.
Section 3, Gives a quick introduction to Convolutional Neural Networks.
Section 4, discusses the main reasons for the rapid advancement of Deep Learning in
recent years.
Section 5, is about preparing the data including setting up the working environment,
collecting the dataset, and preprocessing data.
Section 6, presents the implementation of the models and the training
process. Section 7, discusses the results of the models.
Finally Section 8, draws conclusions and discusses the limitations and suggestions for
future improvements.
Chapter:2 Literature Survey
1. Brief history of Deep Learning
In a 2016 talk titled “Deep Learning for Building Intelligent Computer Systems” Jeff
Dean made a comment that deep learning is really all about large neural networks. Deep
Learning, a subset of Artificial Intelligence, is a Machine Learning technique which can
enable computers to solve problems that were otherwise unable to explicitly program
them to do. Even though the principled method to train deep networks was available
since the 1980s, it still was not able to scale to large networks and Neural Networks
research fell into a dark period.
When you hear the term deep learning, just think of a large deep neural net. Thus, Deep
learning can be defined as an artificial intelligence function that imitates the workings of
the human brain in processing data and creating patterns for use in decision making. It is
also said to be the subset of Machine Learning and has networks capable of learning
unsupervised from data that is unstructured or unlabeled.

2. Dataset
 Overview of the Dataset:
The first and most important part of a Deep Learning project is collecting data. For this
project, chest Xray images are needed from 2 classes: COVID-19 positive and COVID-
19 negative(normal).

 Preparing data
Setting up Google Colaboratory: Before getting the data, it is necessary to set up the
working environment for the project Google Colaboratory (colab) is a free cloud-based
with no setup required Jupyter notebook environment.
Colaboratory allows users to write and execute code and access powerful computing
resources for free from the browser. Most importantly colab generously provides GPU
which helps significantly speed up the training process which is computing intensive.
For these reasons, colab has become very popular among Deep Learning and Data
Science enthusiasts who might not necessarily own a PC with expensive GPUs. Data is
processed and saved to the dataset folder in google drive and further work can be done.
 Getting COVID-19 chest X-ray images
The first step in building the dataset is to download the COVID chest X-ray dataset.
We will use git hub site to clone the data sets. The folder contains CT and chest X-ray
images with different chest views from patients of COVID-19 .

Image shows examples of chest X-ray images of COVID-19 infected patients

 Getting NORMAL chest X-ray images

The second step in building the dataset is to download the NORMAL chest X-ray
dataset. We will use git hub site to clone the data sets. The folder contains CT and
chest X-ray images with different chest views from patients.

Image shows examples of chest X-ray images of NORMAL patients

3. Preprocessing Data
 Data Visualization
These visualization techniques help us understand the network and may also be useful as an
approximate visual diagnosis for presentation to radiologists
Saliency maps and grad-CAMs generate a heatmap that shows which region of the image
weights more for the classification. The principle these visualizations are based is the
following:
the derivative of the output class score w.r.t to an activation-If the derivative is small, then a
change in the activation will have a negligible impact on the output score, therefore the
activation is unimportant for the classification. The two techniques differ in how the
derivative is back-propagated through the ReLUs. Saliency maps calculate the derivative
w.r.t. the input image, and thus generate a heatmap with the same resolution of the input.
Grad-CAM use deeper feature maps, which typically results in better localization due to the
higher-level nature of the features in deeper layers, but are available only at reduced
resolution due to pooling. in our numerical experiments we used data coming from two
different databases: (1) the COVID-19 Chest X-ray dataset, and (2) the Normal chest X-ray

dataset. 845 and 1266 patients respectively.


Visaulization over COVID-19 positive X-Rays

 Data Augmentation
To achieve robust and generalized deep learning models, large amounts of data are needed.
We applied two different versions of the augmentation technique on the dataset. In the first
version, we applied image augmentation techniques such as random rotation, width shift, height shift,
horizontal, and vertical flip operations using the Image Data Generator functionality from the
TensorFlow Keras framework Nowadays, Generative adversarial networks (GAN) offer a novel method
for data augmentation. Hence, we have used a CycleGAN architecture for increasing the under-
represented COVID-19 class images (described as version 2 for augmentation). Utilizing the normal
class from our dataset, we trained the CycleGAN to transform normal images into COVID-19 images.
As a proof-of-concept at this stage, we have generated 200 COVID-19 images to add to our original
training dataset. Figure shows a few examples of the original and generated images side-by-side. After
5000 iterations of the generator and discriminator training, we have achieved near realistic generated
CXR images, though there are shape deformations seen in some cases. To be noted, the dataset after
augmentation is still quite small, hence we employed five-fold cross-validation during training to avoid
the over-fitting of the model and the validation set served as a checkpoint for us to the trained model’s
performance to unseen data.

4.CNN
CNN abbreviation for Convolutional Neural Network, is a specialized type of neural
network model designed for working with 1D, 2D and even 3D data.
But what is convolution?
Convolutional is a linear operation that involves the multiplication of a set of weights with
the input, much like a traditional neural network. Convolutional Neural Network (CNN) is
one of the most popular types of Deep Neural Networks that is very useful for computer
vision tasks. CNNs take images as input, filter them using convolution operations to get a
final vector that summarizes interesting features of an image. A layer of ten 3x3x3 filters
only has 280 parameters and this number will stay the same even if the input image size
increases, which makes training deeper and larger networks possible.
Chapter:3 Statement of Problem
1.Design a deep learning model for testing of covid-19 patients using chest X-ray

2.Dataset

• Overview Of Dataset
• Preparing Data
I. Setting up Google Colaboratory
II. Getting COVID-19 chest X-ray images
III. Getting normal chest X-ray images
3.Preprocessing data
 Data Visualization
 Data Augmentation
4. Building CNN
Model 5.Training
models
• Training a Convolutional Neural Network from
scratch 6.Result and Accuracy.
Chapter:4 Solution Approach

1. Problem Formulation and Loss Function


We aim to classify a given frontal-view chest X-Ray image into the following
classes:
Normal and COVID-19. We have trained our model in two congurations, the
one which classifies into the above four classes, and the other conguration
with three classes (clubbing viral and bacterial pneumonia into one). The
motivation behind the four class conguration is to better understand if any
confusion between regular pneumonia and COVID-19 is due to the similarity
of pathology between COVID-19 and viral pneumonia.

2. Model Architecture

Our model contains pre-trained Sequential model, containing activation


function of rectified Linear Unit, followed by a fully connected dense layer,
containing activation function of sigmoid.

4.Dataset and Evaluation

We use the covid-chest Xray-dataset for COVID-19 frontal-view chest X-Ray


images of normal lungs. We use the pre-trained model, thus implicitly using
Normal COVID-19 Total

Train 1266 545 1811


Test 317 167 484

robust features obtained after training on ChestX-ray14 dataset. The covid


chestxray-dataset does not contain proper choose 20% of the images as test
set, and 10 images are kept as validation set.
5.Sampling
The combined dataset (covid-chest-x-ray-dataset and chest-x-ray-normal) has a high data
imbalance due to scarce COVID-19 data. To ensure that training loss due to COVID-19
does not get masked by training loss due to other classes, we consider only a random
subset of normal data in each batch. The size of this subset is neither be too small, which
will lead to overfitting on the COVID-19 data. In each batch we take data from classes
Normal and COVID-19.
6. Testing and Evoluation
7.Results

Our results indicate that this approach can lead to COVID-19 detection from X-Ray
images with an AUROC (Area under ROC curve) of 0.9994 for the COVID-19 positive
class, with a mean AUROC of 0.9738 (for 2 -class classification configuration). Since we
have modeled the problem as a binary classification problem for each class, given an
input image X, we treat the class with maximum score as the prediction for calculating
Accuracy, Sensitivity (Recall) and confusion matrix.

8. Comparison with Existing Systems


In the proposed system, To compare the performance of the proposed techniques, the outcome of
the study was compared with the benchmark studies. The criterion for the benchmark was the
studies using X-ray radiology images for the diagnosis of COVID-19.  contains the comparison of
the proposed technique with the benchmark studies in the literature using X-ray images for the
diagnosis of COVID-19.Most of the previous studies had a very limited number of COVID-19 X-
ray radiology images. Novel coronavirus (COVID-19) is a new pandemic; limited open-source X-
ray radiology images are available for developing a deep-learning based automated diagnosis
model. Nevertheless, a huge number of X-ray images is present for other respiratory diseases.
The main contributions of the current study are:
1. The study does not suffer from data imbalance.
2. The model was trained using a large number of COVID-19 X-ray radiology images when
compared to the previous studies.
3. The proposed model is a fully automated diagnosis method and does not require any separate
feature extraction or annotation prior to the diagnosis.
4. Data augmentation was applied to increase the generalization of the proposed model.
5. The model outperforms the the benchmark studies.
Despite the above-mentioned advantages, the study also suffers from some limitations:

1. The proposed system needs to be trained for other respiratory diseases. The current model only
diagnoses COVID-19 and healthy individuals and is unable to diagnose other kinds of
pneumonia and respiratory infections.
2. The number of COVID-19 X-ray radiology images needs to be increased for better model
training. The deep-learning model performance can be further enhanced with the increase in the
size of the data set.
3. The current study was based on the data set curated using several open-source chest X-ray
images. These samples were collected from various research publications or uploaded by
volunteers. Therefore, these X-ray images were not collected in rigorous manner.
Chapter:5 Work Progress
1. Setting up Google Colaboratory
2. Getting datasets.
• Getting COVID-19 chest X-ray images
• Getting normal chest X-ray images
• Getting COVID-19 negative chest X-ray images

3. Data pre-processing
• Data Visualization
• Data augmentation
4. Building CNN Model
Chapter:6 Conclusion

We have presented some initial results on detecting COVID-19 positive cases from chest X-Rays using
a deep-learning model. We have demonstrated significant improvement in performance. The results will
hopefully look promising, though the size of the publicly available dataset is small. We plan to further
validate our approach using larger COVID-19 X-ray image datasets and clinical trials.we used transfer-
learning for an automated COVID-19 diagnosis using X-ray images. The motivation of using X-ray
images for the diagnosis of COVID-19 is the lower sensitivity of the RT-PCR diagnosis test. The
proposed system achieved the highest sensitivity of 100% and specificity of 99.3% when compared to
the studies in the benchmark. The system can assist radiologists in the early diagnosis of COVID-19.
Generalization in the model was achieved by generating the data using augmentation. Moreover, the
study attempted to use a large number of COVID-19 X-ray images by combining several open-source
data sets. Despite combining multiple open-source data sets, there is still a need for an increased
number of COVID-19 positive X-ray sample images. An increased number of COVID-19 X-ray
samples will enhance the model’s performance.
References
1. Hinton GE, Osindero S, Teh YW. A Fast Learning Algorithm for Deep Belief
Nets. Neural Computation. 2006; 18(7):1527-1554.
URL: https://www.cs.toronto.edu/~hinton/absps/fastnc.pdf. Accessed 19 November
2018
2. Krizhevsky A, Sutskever I, Hinton GE. ImageNet Classification with Deep Convolutional
Neural Networks. In: Neural Information Processing Systems Conference;
2012. Available from: Google Scholar. URL: https://papers.nips.cc/paper/4824-imagenet-
classificationwith-deep-convolutional-neural-networks.pdf. Accessed 19 November 2018
3. Cohen et al. COVID-19 image data collection. URL: https://github.com/ieee8023/covid-chestxray
dataset. Accessed 13 April 2020.

You might also like