You are on page 1of 12

362 Int. J. Computer Applications in Technology, Vol. 66, Nos.

3/4, 2021

Gradient and statistical features-based prediction


system for COVID-19 using chest X-ray images

Anurag Jain and


Shamik Tiwari*
School of Computer Sciences,
University of Petroleum and Energy Studies,
Dehradun, Uttarakhand, India
Email: anurag.jain@ddn.upes.ac.in
Email: shamik.tiwari@ddn.upes.ac.in
*Corresponding author

Tanupriya Choudhury and


Bhupesh Kumar Dewangan
Department of Informatics,
School of Computer Science,
University of Petroleum and Energy Studies,
Dehradun, Uttarakhand, India
Email: tanupriya@ddn.upes.ac.in
Email: b.dewangan@ddn.upes.ac.in

Abstract: As per data available on WHO website, COVID-19 patients on 20 June 2020 have
surpassed the Figure of 8.7 million globally and around 4.6 lakhs have lost their life. The most
common diagnostic test for COVID-19 detection is a Polymerase Chain Reaction (PCR) test. In
highly populated developing countries like Brazil, India etc., there has been a severe shortage of
PCR test-kits. Furthermore, the PCR-test is very specific and has lower sensitivity. In this research
work, authors have designed a decision support system based on statistical features and edge maps
of X-ray images to detect COVID-19 virus in a patient. Online available data sets of chest X-ray
images have been used to train and test decision tree, K-nearest neighbour’s, random forest, and
multilayer perceptron machine learning classifiers. From the experimental results, it has found that
the multilayer perceptron achieved 94% accuracy which is higher than the other classifiers.

Keywords: COVID-19; chest X-ray; statistical features; image gradient; random forest; KNN;
multilayer perceptron; decision tree.

Reference to this paper should be made as follows: Jain, A., Tiwari, S., Choudhury, T. and
Dewangan, B.K. (2021) ‘Gradient and statistical features-based prediction system for COVID-19
using chest X-ray images’, Int. J. Computer Applications in Technology, Vol. 66, Nos. 3/4,
pp.362–373.

Biographical notes: Anurag Jain is currently serving at the University of Petroleum and Energy
Studies, Dehradun as an Associate Professor in the School of Computer Science. He is in the
field of academia for the last 18 years. He has published around 50 research papers in renowned
journals and conferences. 14 students have successfully defended their MTech thesis under his
guidance. His research area is in the domain of scheduling and load balancing in cloud
computing, healthcare, machine learning, and data science. Currently, he is guiding 5 PhD
students in their research work.

Shamik Tiwari currently working as a Senior Associate Professor in the Department of Virtualisation
at SoCS, University of Petroleum and Energy Studies, Dehradun. He has rich experience of around
17 year as an academician. His research interests include digital image processing, computer vision,
bio-metrics, machine learning especially deep learning and health informatics. He has written many
national and international publications including books in these fields.

Tanupriya Choudhury currently working as Sr. Associate Professor in the School of Computer
Sciences at the University of Petroleum and Energy Studies, Dehradun. He has 10 years’
experience in teaching as well as in research. Recently he has received Global Outreach

Copyright © 2021 Inderscience Enterprises Ltd.


Gradient and statistical features-based prediction system for COVID-19 363

Education Award for Excellence in Best Young Researcher Award in GOECA 2018. His areas of
interest include human computing, Soft computing, cloud computing, data mining, etc. He has
filed 14 patents to date and received 16 copyrights from MHRD for his own software. He is a
lifetime member of IETA, a member of IEEE, and a member of IET (UK) and other renowned
technical societies.

Bhupesh Kumar Dewangan pursued Bachelor of Technology degree form Pt Ravi Shankar
Shukla University (State University), Raipur and Master of Technology from Chhattisgarh
Swami Vivekananda Technical University (State Technical University), Bhilai, in Computer
Science and Engineering. Currently, he is pursuing PhD degree in Computer Science and
Engineering and an Assistant Professor, Department of Informatics, at the University of
Petroleum and Energy Studies. He has more than 40 research publications in various international
journals and conferences with SCI/SCOPUS/UGC indexing. His research interests include
autonomic cloud computing, resource scheduling, software engineering and testing. He is a
Member of various organisations like ISTE, IAPFE, etc. Currently, he is an editor in special issue
journals of Inderscience & IGI publication house, and an editor/author of two books of Springer
& Elsevier publication house.

1 Introduction supportive care. So far, there is no vaccination to protect from


this virus. Treatment and vaccine are still in development. To
Novel coronavirus named COVID-19 has spread rapidly and detect the disease, doctors can use Reverse Transcription
has caused a global outbreak of respiratory illness. It initially Polymerase Chain Reaction (RT-PCR) test. However, this test
occurred in earlier December 2019 in Wuhan city of Hubei fails in detection of virus during starting stage. Doctors can use
Province, China. There are multiple kinds of coronavirus, chest X-ray to detect the COVID-19. In the beginning of
which can cause respiratory disease. Respiratory disease can pandemic, sensitivity of RT-PCR test was between (30% and
vary from light cold to pneumonia and generally, symptoms are 70%), which was less relative to chest X-ray test. While in 2nd
mild. However, few type of corona virus can cause severe generation RT-PCR test, sensitivity was increased up to 95%
disease like Severe Acute Respiratory Syndrome (SARS) and this become more effective relative to chest X-ray. However,
Coronavirus that was first found in China in 2003, the Middle wait time and unavailability of sufficient number of labs
East Respiratory Syndrome (MERS) Coronavirus that was first are still an associated issue with RT-PCR test. To handle
discovered in Saudi Arab in 2012. The novel coronavirus
this situation and utilise the easily available massive
COVID-19 was first discovered in China in 2019 (Van
radiographic infrastructure with well trained staff, it is essential
Doremalen et al., 2020; Song et al., 2019). Coronavirus are a
to design a system which can do the analysis of chest X-ray
big family of virus that consist of genetic material surrounded
by protons spikes, which gives the look of crown. In Latin and predict about the COVID-19 (Li et al., 2020; Letko and
language, crown is called ‘corona’ (Huang et al., 2020; Munster, 2020). Structure of manuscript is as follows:
Wu et al., 2020). This is how the virus got its name. The Introduction and history of COVID-19 virus along with
diseases has spread from sick people to others in close contacts motivation for this work has already been discussed in
including family members, health workers staff and the people Section 1. Radiographic features of COVID-19 infected chest
who’re living, studying, travelling together. Common signs of patient’s X-ray and work done by other researchers for finding
COVID-19 are mild to severe respiratory illness with a fever, COVID-19 from chest X-ray is discussed in Section 2. Section
cough, and shortness of breath. In more severe cases, infections 3 contains the details of methodology, data set and simulation
can cause kidney failure, pneumonia, heart failure, severe acute environment. Results and their discussion are given in
respiratory syndrome, central nervous problems and even Section 4, which is followed by conclusion and future scope in
death. So far, the global number of the reported cases of Section 5.
COVID-19 has surpassed 8.7 million on 20 June 2020
(WORLDOMETERS, n.d.).
2 Related work
1.1 Motivation
In this Section, authors have discussed the radiographic
COVID-19 can be diagnosed by history of epidemiology, features of chest X-ray of COVID-19 patients. How other
clinical manifestation and the lab test. For its treatment, there is researchers have designed the prediction model for detection of
no particular medication for the virus and as a treatment is COVID-19 through chest X-ray, has also discussed.
364 A. Jain et al.

2.1 Radiographic feature of COVID-19 effected Figure 4 Chest X-ray of COVID-19 infected patient at severe
chest X-ray stage (Kong and Agarwal, 2020; Ng et al., 2020;
Radiology Assistant, n.d.)
Through analysis of chest X-ray of COVID-19 infected
patients, it is found that at early stage, symptoms are mild
and there is small ground glass opacity and nodules in
lungs. This is demonstrated in Figure 1.

Figure 1 Chest X-ray of COVID-19 infected patient at early stage


showing multiple nodules and ground glass opacities in
right upper and left lower lobe (Kong and Agarwal, 2020;
Ng et al., 2020; Radiology Assistant, n.d.)

2.2 Contribution by other researchers


Ghoshal and Tucker (2020) designed deep learning-based
model to find uncertainty in computer-based COVID-19
detection model from chest X-ray. As uncertainty is strongly
correlated with accuracy of prediction system, so authors
have tried to quantify the uncertainty of the prediction
system. Authors have used Drop weights-based Bayesian
With the progress of disease, symptoms become severe and Convolutional Neural Network to estimate the level of
this results in multiple ground glass opacity around nodule. uncertainty in any deep learning-based COVID-19 detection
This is shown in Figure 2. Moreover, there are multiple system from chest X-ray. This will improve the clinicians’ trust
consolidation in bilateral pulmonary. Halo, reversed halo in the capability of computer-based COVID-19 diagnosis
and paving stone sign are deposited around nodule. This is system. Apostolopoulos and Mpesiana (2020) proposed a
shown in Figure 3. COVID-19 detection model using chest X-rays. Model was
developed using deep learning with CNN (specifically transfer
Figure 2 Chest X-ray of COVID-19 infected patient at progressive learning procedure). For training of model, authors have used
stage (Kong and Agarwal, 2020; Ng et al., 2020;
Radiology Assistant, n.d.) the chest X-ray images of pneumonia patient, confirmed
COVID-19 patient and normal person. These images are
publicly available on medical repositories. Through simulation
results, authors have claimed 96.78% accuracy in finding the
COVID-19 patients. Rajaraman et al. (2020) designed COVID-
19 detection system from chest X-ray using pruned deep
learning model by ensemble of multiple models. Authors have
applied modality specific transfer learning on chest X-rays for
training and testing of model. Through iterative pruning,
Figure 3 Chest X-ray of COVID-19 infected patient at progressive
stage (Kong and Agarwal, 2020; Ng et al., 2020;
authors were able to select the best-pruned model. Further, to
Radiology Assistant, n.d.) improve the performance, authors have assigned weights to
pruned model and constructed the ensemble of the best-pruned
models. By ensemble of weighted pruned models, authors have
claimed to achieve 99.01% accuracy. Hall et al. (2020)
discussed the significance of chest X-ray images in detection of
COVID-19. Authors have developed deep learning model by
ensemble of pre-trained ResNet 50, VGG 16 and CNN. For
At severe stage, there are a lot of diffuse lesions in both lungs, training and testing purpose, authors have used publicly
pulmonary fibrosis formation and lungs become white. This available chest X-ray images. Through the ensemble of three
is shown in Figure 4. different classifier authors have claimed to achieve 91.24%
Gradient and statistical features-based prediction system for COVID-19 365

accuracy. Khan et al. (2020) designed a model to classify the 3.1 Statistical and edge-based
COVID-19, pneumonia-viral, pneumonia-bacterial and normal image features
patient through the chest X-ray images. Model was designed
Edge detection is used to find object boundaries in an image. It
using transfer-learning method under deep convolutional neural
is the first step in object recognition. It works by finding abrupt
network using publically available chest X-ray images. Authors
changes in intensity. In this work, statistical features from
have claimed to achieve 89.6% overall accuracy. Bassi and
image histogram and edge-based features are computed as
Attux (2020) proposed chest X-ray classification model for
handcraft features for classification. Statistical features consist
detection of COVID-19. They have developed model using
of mean, standard deviation, skewness and kurtosis (Tiwari,
DenseNet121 CNN using chest X-ray images of COVID-19
2020, 2017). These features can be defined as:
patients, pneumonia patients and normal person. The model
Let z be a random variable representing gray levels and
was trained twice, first through ImageNet and second through
let
chest X-ray database. Through simulation study, authors have
claimed to achieve 97.8% test accuracy for COVID-19 class. In p  zi  , i  0,1, 2, ,, L –1 , be the corresponding histogram,
this work, authors have proposed a hybrid features-based where L is the number of distinct grey levels.
classification model to acquire more detailed information from Mean: It measures the average intensity of the image.
X-ray data. Detailed information fetched from X-ray images
L 1
has been used in the classification task. Main contributions of
m  zi . p  zi  (1)
proposed work are as follows: i 0

1) Design a hybrid feature set by combining statistical Standard deviation: It measures the contrast of the grey
features and edge-based features computed from image.
radiography images.
L 1
2) Design the multiclass classification models using hybrid
features set. For multiclass classification, three classes
 ( zi 0
i  m ) 2 p  zi  (2)

are considered namely Viral Pneumonia, Normal and


COVID-19. Skewness: It measures the 3rd order moment of z about the
mean.
3) Compare the performance of the various classification L 1
models and propose the best classification model for S   ( zi  m ) 3 p  z i  (3)
the computed feature set. i 0

Kurtosis: It measures the 4th order moment of z about the


mean.
3 Methodology
L 1

In this section, details of methods used for finding statistical- K   ( zi  m ) 4 p  zi  (4)


i 0
based and edge-based features of image are described in detail.
Details of image data set and simulation environment used for An image and its histogram are shown in Figures 5(a)
implementing the work are also discussed. and 5(b).

Figure 5 An X-ray image in (a) and its histogram in (b)

(a) (b)
366 A. Jain et al.

After computing the statistical features, edge maps are image shown in Figure 5 is shown in Figures 6 (a), 6(b), 6(c),
computed using canny edge operators, Laplacian edge 6(d), 6(e), 6(f), 6(g), 6(h), 6(i) All these edge maps and
operator, Sobel edge operator in x-direction and y-direction, statistical features are arranged into a feature vector for
Prewitt edge operator in x-direction and y-direction and binary classification purpose. Methodology of Proposed work plan is
edge map using thresholding. Effect of these operators on as follows:

Methodology: COVID-19 prediction from chest X-ray


Input: Chest X-ray images of COVID-19 patients, pneumonia patients and normal person from the freely online data set.
Output: Model to detect possibility of COVID-19
Step 1: Calculate the Statistical feature vector SF of input image using Statistical_Feature_Extraction() algorithm.
Step 2: Calculate the Edge-based feature vector EF of downloaded image using Edge_based_Feature_Extraction() algorithm
Step 3: Combine the statistical features and edge-based features of image to form a data set and divide it into train and test
portion in 70:30.
Step 4: Train the different models through training data generated in step 3.
Step 5: Test the performance of trained model through test data generated in step 3.
Step 6: Analyse the results generated in step 5 on appropriate parameters to identify the best model use it for detection of
COVID-19.

Algorithm: Statistical_Feature_Extraction()
Input: Grey Scale Image g(x, y)
Output: Feature Vector SF
Step 1: Compute the histogram h  i  of g  x, y  , where i denotes the grey levels.
Step 2: Calculate the mean of h  i 
Step 3: Calculate the standard deviation of h  i 
Step 4: Calculate the Kurtosis of h  i 
Step 5: Calculate the skewness of h  i 
Step 6: Concatenate all the features obtained in steps 2–5 to form the statistical feature vector SF .

Algorithm: Edge_Feature_Extraction()
Input: Grey Scale Image g(x, y)
Output: Edge Vector EF
Step 1: Compute Gradient in x-direction using Prewitt operator.
Step 2: Compute Gradient in y-direction using Prewitt operator.
Step 3: Compute Gradient in x-direction using Sobel operator.
Step 4: Compute Gradient in y-direction using Sobel operator.
Step 5: Compute Gradient of g  x, y  using Canny edge operator.
Step 6: Compute Gradient of g  x, y  using Laplacian edge operator.
Step 7: Compute edge map of g  x, y  using binary thresholding.
Step 8: Concatenate all the features obtained in steps 1–7to form the edge feature vector EF .
Gradient and statistical features-based prediction system for COVID-19 367

Figure 6 An X-ray image and its edge maps using different gradient operators (a) Original image, (b) Edge map using Laplacian gradient
operator, (c) Edge map using Canny edge operator, (d) Edge map using Sobel gradient operator in x-direction, (e) Edge map
using Sobel gradient operator in y-direction, (f) Edge map using Sobel gradient operator in x- and y-direction, (g) binary
threshold image, (h) Edge map using Prewitt gradient operator in x-direction, (i) Edge map using Prewitt gradient operator in
y-direction

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

3.2 Image data set in Figure 7. All the images are resized to 128  128 and
The COVID-19 X-ray image data set used in this work is converted to grey scale before feature extraction.
collected by following web Italian Society of Medical
and Interventional Radiology, Radiological Society of 3.3 Simulation environment details
North America and Radiopaedia (Dadario, 2020). This data
includes 2,191,345 and 1341 images with confirmed Four different classifiers are designed using feature map as
COVID-19’s pneumonia, common bacterial pneumonia and discussed in methodology section. The details of each
normal X-ray, respectively. The data set is divided into experiment is provided in the following sub-Sections. To
training and test subsets in the ratio of 70% and 30% i.e. measure the performance of each classification model,
number of images in each subset are 2033 and 872, precision, recall (sensitivity), F-score are measured from
respectively. The sample images from each class are shown confusion matrix.
368 A. Jain et al.

Figure 7 Sample chest X-ray images (a) Chest X-ray of COVID- classification. Concept of information gain is useful while
19 patient, (b) Chest X-ray normal person and (c) Chest selecting a feature to split. Information gain is measure of level
X-ray of viral pneumonia patient
up to which we can remove uncertainty. Its value lies between
[0, 1]. We use term entropy to calculate information gain.
Entropy is measure of uncertainty. It is expressed as follows:
n
E  S     pi log 2 pi (5)
i 1

 Sv 
IG  S , A   E  S  

v
 
values  A   S 
 E  Sv  (6)

Here, S is the given training set, A is the feature selected for


(a) (b) split, and pi represents the number of examples of class i.
While experimenting, Jain et al. (2019) used the default
settings of decision tree classifier.

3.3.3 K-NN classification model for


COVID-19 detection
K-NN is a supervised learning-based algorithm, which is a
kind of lazy algorithm. This is called lazy algorithm, as we
do not train the model, we store the training data set in a
data structure and use it at the time of prediction. We
represent the available data set in the form of points in
(c) (xi, yi) coordinate system, xi here represents the independent
feature and yi represents the dependent feature. At the time
of prediction, we are given some value xj and we apply
3.3.1 Random forest classification model for
function f on xj to find equivalent yj. We chose k points
COVID-19 detection {(x1, y1), (x2, y2),… (xk, yk)} which are closest to xj and then
Random forest is supervised learning-based meta-heuristic predict the value of yj. In case of classification, for finding
classifier, which is capable of performing both regression as the value of yj, we chose the most frequent class among the
well as classification task. The beauty of this classifier is k-nearest yi ‘s and in case of regression we take the average
that it can handle the missing values, handle large data set of selected yi‘s to find the value of yj (Kotsiantis et al.,
with higher dimensionality and does not suffer with the 2007). While experimenting, authors have used the ball tree
issue of overfitting. Random forest classifier consists of algorithm to compute the nearest neighbours and among
multiple decision trees which are independent in nature but those chosen the three nearest neighbours.
they operate in an ensemble way. Each individual decision
tree gives its own class as output and class, which is in 3.3.4 MLP model for COVID-19 detection
majority among the output of all decision tree, will become
the class of random forest (Noi and Kappas, 2018; Multilayer perceptron algorithm is used in case of
Kotsiantis et al., 2007). While experimenting, authors have supervised learning-based problems. Multilayer perceptron
used 100 decision trees with a maximum depth of 20. is collection of single input layer, one or more hidden layer
and one output layer. It is a kind of feed forward network,
which is represented as finite acyclic graph. Its nodes are
3.3.2 Decision tree classification model for neurons with non-linear activation function. Functionality of
COVID-19 detection input layer is to receive signal, while output layer makes
Decision tree is a supervised learning-based tree structured prediction or classification based on input. All hidden layers
classifier approach, which have two types of nodes: and input layer are fully connected i.e. every node in the
k-th layer is connected to every node in the (k+1)–th layer.
 Decision node: It specify a test or choice of some All connections between the layers are weighted. If input is
feature, with one branch for each outcome. of n dimensions then there must be n neurons in the input
 Leaf node: It indicates the predicted value or classification layer.

 
of the problem.
Z     W   X  b 
i iT iT
(7)
Bias factor while using decision tree classifier is prefer smaller
tree i.e. either depth should be low or number of nodes should Here, X represents the input to the unit, W is the weight
be less. When to stop and which attribute to split are some of function for i-th layer, b is the bias function for i-th layer, Z
the factors, which can affect the performance of decision tree is the output of the i-th layer,  is non-linear activation
Gradient and statistical features-based prediction system for COVID-19 369

function e.g. sigmoid function;  for different layers can 4 Result and discussion
be same or it may be different (Jain et al., 2019).
While experimenting, authors have used MLP model 4.1 Results
which consists of three dense layers with 128 neurons in first Figures 8(a), 8(b), 8(c) and 8(d) show the confusion matrix
two layers and three neurons in last dense layer. The first and for random forest, decision tree, K-nearest neighbour and
second dense layers are followed by the ReLU activation multi-layer perceptron classification model for the test data.
layer while the last dense layer has used softmax activation Confusion matrix is used to show the performance of
layer. For regularisation purpose a dropout layer is used after classifier on test data diagrammatically.
first and second dense layers with 0.15 dropout rate. The Tables 1 and 2 shows the precision, sensitivity and F-score
model is optimised using Root Mean Square prop (RMSprop) performance metrics calculated from the confusion matrix
and categorical cross entropy loss function (Jain et al., 2018). shown in Figure 8 for random forest, decision tree, K-nearest
The model is trained for 50 epochs with batch size 32. neighbour and multilayer perceptron, respectively.

Figure 8 Confusion matrix representing true and predicted labels separately for each classification model, (a) Confusion matrix for
random forest classification model, (b) Confusion matrix for decision tree classification model, (c) Confusion matrix for
k-nearest neighbour algorithm classification model, (d) Confusion matrix for multilayer perceptron classification model

(a) (b)

(c) (d)
370 A. Jain et al.

Table 1 Results in terms of performance metrics for using random forest and decision tree classification models

COVID-19 detection using RF COVID-19 detection using DT


Class/Metrics Precision Sensitivity F-score Precision Sensitivity F-score
COVID-19 1.00 0.43 0.65 0.49 0.54 0.51
Normal 0.93 0.89 0.91 0.78 0.78 0.78
Viral Pneumonia 0.92 0.91 0.92 0.79 0.79 0.79
Macro Average 0.95 0.74 0.84 0.69 0.70 0.69
Micro Average 0.93 0.87 0.90 0.76 0.76 0.76
Weighted Average 0.93 0.87 0.89 0.77 0.76 0.72
Accuracy 0.87 0.76

Table 2 Results in terms of performance metrics using K-nearest neighbour and multilayer perceptron classification models

COVID-19 detection using KNN COVID-19 detection using MLP


Class/Metrics Precision Sensitivity F-score Precision Sensitivity F-score
COVID-19 0.76 0.80 0.78 0.88 0.89 0.89
Normal 0.98 0.49 0.65 0.95 0.94 0.94
Viral Pneumonia 0.69 0.98 0.81 0.94 0.94 0.94
Macro Average 0.81 0.76 0.75 0.92 0.92 0.92
Micro Average 0.76 0.76 0.76 0.94 0.94 0.94
Weighted Average 0.82 0.76 0.74 0.94 0.94 0.94
Accuracy 0.76 0.94

4.2 Discussion and analysis Sensitivity or Recall represents the percentage of actual
positive cases, which was predicted as positive. It represents
Precision is measure of consistency. It represents the ability of the ability of classifier to recognise all relevant cases correctly.
classifier to return only relevant cases. It represents how Mathematically, it could be represented by following formula:
many percentage of positive value is actually correct.
Mathematically, it could be represented by following formula: True Positive True Positive
R or (9)
True Positive True Positive True Positive  False Negative Predicted Results
P or (8)
True Positive  False Positive Actual Results Figure 10 shows the comparison of different classifiers on
the scale of recall for different classes. It has found that
So if value of P is x% then this imply that if classifier is
multilayer perceptron classifier is showing the best ability to
predicting about positive value then it will be correct x% of times.
find all classes correctly.
Figure 9 shows the comparison of different classifiers on
the scale of precision for different classes. It has found that
Figure 10 Recall analysis for different classifier
random forest and multilayer perceptron classifier are
showing good ability in finding relevant class.

Figure 9 Precision analysis for different classifier

There is always a trade-off between precision and recall.


With the increase of recall, precision decreases. Still both
Gradient and statistical features-based prediction system for COVID-19 371

precision and sensitivity, both represents model accuracy. Figure 12 shows the comparison of different classifiers on the
Which parameter between precision and recall should be scale of accuracy. This concludes that multilayer perceptron
maximised depends upon the problem. However, there is classifier has shown the best accuracy relative to other
another metric through which we can consider both classifier. Receiver Operating Characteristic (ROC) curve is
precision and recall at the same time, it is called F1 score. another way to analyse the performance of different classifiers.
Now instead of making a balance between two metrics, aim It depicts how precision vs. recall relationship changes with the
is to maximise single parameter F1 score. F1 score is change of threshold value for identification of positive case.
harmonic mean of Precision and recall. This will help us to find right value of precision and recall. By
calculating the area under the ROC curve, we can judge the
P*R
F1 score  2* performance of classifier. This area lies between 0 and 1. The
PR closer to 1 the better will be the classifier. To further strengthen
Figure 11 shows the comparison of different classifiers on and validate the results, receiver operating characteristic curve
the scale of F1 for different classes. It has found that is plotted for all four models, respectively. This is shown in
multilayer perceptron classifier is showing the best ability to Figures 13(a), 13(b), 13(c) and 13(d) corresponding to random
make a balance between precision and recall. forest, decision tree, k-nearest neighbour and multilayer
perceptron model. The high Area under the ROC Curve (AUC)
Figure 11 F1 analysis for different classifier score not only depicts the strength of the multilayer perceptron
model but also validates the results shown in Tables 1 and 2,
respectively.

Figure 12 Accuracy analysis for different classifier

Accuracy represents the correctness of classifier in predicting


the correct class. Its higher value is always desirable.
Mathematically it can be represented as follows:
True Positive  True Negative It has been found that the model having multilayer perceptron
A
 True positive  false positive  classifier has shown the maximum accuracy of 94% among
Total  
 true negative  false negative  the four classifiers.
Figure 13 Receiver operating characteristic curves separately for each classification model (a) ROC curve for random forest classification
model, (b) ROC curve for decision tree classification model, (c) ROC curve for k-nearest neighbour algorithm classification
model, (d) ROC curve for multi-layer perceptron classification model

(a) (b)
372 A. Jain et al.

Figure 13 Receiver operating characteristic curves separately for each classification model (a) ROC curve for random forest classification
model, (b) ROC curve for decision tree classification model, (c) ROC curve for k-nearest neighbour algorithm classification
model, (d) ROC curve for multi-layer perceptron classification model (continued)

(c) (d)

5 Conclusions Huang, C., Wang, Y., Li, X., Ren, L., Zhao, J., Hu, Y. and Cheng, Z.
(2020) ‘Clinical features of patients infected with 2019 novel
coronavirus in Wuhan, China’, The Lancet, pp.497–506.
COVID-19 pandemic effect will continue until vaccination is
available. Owing to the insufficient healthcare facilities in Jain A., Tiwari S. and Sapra V. (2018) Hands on Deep Learning
with Python Programming, Lambert Academic Publishing
highly populated developing countries like Brazil, India, etc., House. Doi: 10.18280/ria.340413.
situation is becoming worse day by day. Therefore, in current
Jain, A., Tiwari, S. and Sapra, V. (2019) ‘Two-phase heart disease
scenario, medical practitioners are trying to use available diagnosis system using deep learning’, International Journal
infrastructure of radiography system for detecting COVID-19. of Control and Automation, Vol. 12, No. 5, pp.558–573.
Work proposed in this paper is in the direction to help Khan, A.I., Shah, J.L. and Bhat, M.M. (2020) ‘Coronet: a deep
healthcare workers. Functions of image processing has neural network for detection and diagnosis of COVID-19
used to find statistical features and edge-based features for from chest X-ray images’, Computer Methods and Programs
online available chest X-ray images of COVID-19 patients, in Biomedicine. Doi: 10.1016/j.cmpb.2020.105581.
pneumonia patients and normal persons. 70% of this data has Kong, W. and Agarwal, P.P. (2020) ‘Chest imaging appearance of
used for training of decision tree, random forest, k-nearest COVID-19 infection’, Radiology: Cardiothoracic Imaging,
neighbour and multilayer perceptron classifiers. Remaining Vol. 2, No. 1. Doi: 10.1148/ryct.2020200028.
30% was used for testing of the classifier models. It has found Kotsiantis, S.B., Zaharakis, I. and Pintelas, P. (2007) ‘Supervised
machine learning: a review of classification techniques’,
that multilayer perceptron classifier has given 94% accuracy to
Emerging Artificial Intelligence Applications in Computer
detect COVID-19 patient from chest X-ray. Engineering, Vol. 160, pp.3–24.
To continue their work of helping healthcare professionals, Letko, M.C. and Munster, V. (2020) ‘Functional assessment of cell
authors have started the work in the direction of finding entry and receptor usage for lineage B β-coronaviruses,
COVID-19 from CT-scan images of chest. including 2019-nCoV’, BioRxiv. Doi: 2020.01.22.915660.
Li, Q., Guan, X., Wu, P., Wang, X., Zhou, L., Tong, Y. and Xing,
X. (2020) ‘Early transmission dynamics in Wuhan, China, of
References novel coronavirus-infected pneumonia’, New England
Journal of Medicine. Doi: 10.1056/NEJMoa2001316.
Apostolopoulos, I.D. and Mpesiana, T.A. (2020) ‘COVID-19:
Ng, M.Y., Lee, E.Y., Yang, J., Yang, F., Li, X., Wang, H. and
automatic detection from X-ray images utilizing transfer
Hui, C.K.M. (2020) ‘Imaging profile of the COVID-19
learning with convolutional neural networks’, Physical and
infection: radiologic findings and literature review’,
Engineering Sciences in Medicine, Vol. 43, pp.635–640.
Radiology: Cardiothoracic Imaging, Vol. 2, No. 1. Doi:
Bassi, P.R. and Attux, R. (2020) ‘A deep convolutional neural 10.1148/ryct.2020200034.
network for COVID-19 detection using chest X-rays’, arXiv Noi, P.T. and Kappas, M. (2018) ‘Comparison of random forest, k-
preprint arXiv:2005.01578. nearest neighbor, and support vector machine classifiers for
Dadario, A.M.V. (2020) COVID-19 X rays, Kaggle. Doi: land cover classification using Sentinel-2 imagery’, Sensors,
10.34740/KAGGLE/DSV/1019469. Vol. 18, No. 1, pp.1–20.
Ghoshal, B. and Tucker, A. (2020) ‘Estimating uncertainty and Radiology Assistant (n.d.) Radiology Assistant. Available online
interpretability in deep learning for coronavirus (COVID-19) at: https://radiologyassistant.nl/chest/lk-jg-1#chest-radiograph
detection’, arXiv preprint arXiv:2003.10769. Rajaraman, S., Siegelman, J., Alderson, P.O., Folio, L.S., Folio,
Hall, L.O., Paul, R., Goldgof, D.B. and Goldgof, G.M. (2020) L.R. and Antani, S.K. (2020) ‘Iteratively pruned deep
‘Finding COVID-19 from chest X-rays using deep learning learning ensembles for COVID-19 detection in chest X-rays’,
on a small dataset’, arXiv preprint arXiv:2004.02060. arXiv preprint arXiv:2004.08379.
Gradient and statistical features-based prediction system for COVID-19 373

Song, Z., Xu, Y., Bao, L., Zhang, L., Yu, P., Qu, Y. and Van Doremalen, N., Bushmaker, T., Morris, D.H., Holbrook, M.G.,
Qin, C. (2019) ‘From SARS to MERS, thrusting Gamble, A., Williamson, B.N. and Lloyd-Smith, J.O. (2020)
coronaviruses into the spotlight’, Viruses, Vol. 11, No. 1, ‘Aerosol and surface stability of SARS-CoV-2 as compared with
pp.1–29. SARS-CoV-1’, New England Journal of Medicine, Vol. 382,
Tiwari, S. (2017) ‘A pattern classification based approach No. 16, pp.1564–1567.
for blur classification’, Indonesian Journal of Electrical WORLDOMETERS (n.d.) Reported Cases and Deaths by Country
Engineering and Informatics (IJEEI), Vol. 5, No. 2, or Territory. Worldometer Coronavirus Population.
pp.162–173. https://www.worldometers.info/coronavirus/?utm_campaign=
Tiwari, S. (2020) ‘A comparative study of deep learning models homeAdvegas1?%20
with handcraft features and non-handcraft features for Wu, J.T., Leung, K. and Leung, G.M. (2020) ‘Nowcasting and
automatic plant species identification’, International Journal forecasting the potential domestic and international spread
of Agricultural and Environmental Information Systems of the 2019-nCoV outbreak originating in Wuhan, China:
(IJAEIS), Vol. 11, No. 2, pp.44–57. a modelling study’, The Lancet, pp.689–697.

You might also like