Professional Documents
Culture Documents
3/4, 2021
Abstract: As per data available on WHO website, COVID-19 patients on 20 June 2020 have
surpassed the Figure of 8.7 million globally and around 4.6 lakhs have lost their life. The most
common diagnostic test for COVID-19 detection is a Polymerase Chain Reaction (PCR) test. In
highly populated developing countries like Brazil, India etc., there has been a severe shortage of
PCR test-kits. Furthermore, the PCR-test is very specific and has lower sensitivity. In this research
work, authors have designed a decision support system based on statistical features and edge maps
of X-ray images to detect COVID-19 virus in a patient. Online available data sets of chest X-ray
images have been used to train and test decision tree, K-nearest neighbour’s, random forest, and
multilayer perceptron machine learning classifiers. From the experimental results, it has found that
the multilayer perceptron achieved 94% accuracy which is higher than the other classifiers.
Keywords: COVID-19; chest X-ray; statistical features; image gradient; random forest; KNN;
multilayer perceptron; decision tree.
Reference to this paper should be made as follows: Jain, A., Tiwari, S., Choudhury, T. and
Dewangan, B.K. (2021) ‘Gradient and statistical features-based prediction system for COVID-19
using chest X-ray images’, Int. J. Computer Applications in Technology, Vol. 66, Nos. 3/4,
pp.362–373.
Biographical notes: Anurag Jain is currently serving at the University of Petroleum and Energy
Studies, Dehradun as an Associate Professor in the School of Computer Science. He is in the
field of academia for the last 18 years. He has published around 50 research papers in renowned
journals and conferences. 14 students have successfully defended their MTech thesis under his
guidance. His research area is in the domain of scheduling and load balancing in cloud
computing, healthcare, machine learning, and data science. Currently, he is guiding 5 PhD
students in their research work.
Shamik Tiwari currently working as a Senior Associate Professor in the Department of Virtualisation
at SoCS, University of Petroleum and Energy Studies, Dehradun. He has rich experience of around
17 year as an academician. His research interests include digital image processing, computer vision,
bio-metrics, machine learning especially deep learning and health informatics. He has written many
national and international publications including books in these fields.
Tanupriya Choudhury currently working as Sr. Associate Professor in the School of Computer
Sciences at the University of Petroleum and Energy Studies, Dehradun. He has 10 years’
experience in teaching as well as in research. Recently he has received Global Outreach
Education Award for Excellence in Best Young Researcher Award in GOECA 2018. His areas of
interest include human computing, Soft computing, cloud computing, data mining, etc. He has
filed 14 patents to date and received 16 copyrights from MHRD for his own software. He is a
lifetime member of IETA, a member of IEEE, and a member of IET (UK) and other renowned
technical societies.
Bhupesh Kumar Dewangan pursued Bachelor of Technology degree form Pt Ravi Shankar
Shukla University (State University), Raipur and Master of Technology from Chhattisgarh
Swami Vivekananda Technical University (State Technical University), Bhilai, in Computer
Science and Engineering. Currently, he is pursuing PhD degree in Computer Science and
Engineering and an Assistant Professor, Department of Informatics, at the University of
Petroleum and Energy Studies. He has more than 40 research publications in various international
journals and conferences with SCI/SCOPUS/UGC indexing. His research interests include
autonomic cloud computing, resource scheduling, software engineering and testing. He is a
Member of various organisations like ISTE, IAPFE, etc. Currently, he is an editor in special issue
journals of Inderscience & IGI publication house, and an editor/author of two books of Springer
& Elsevier publication house.
2.1 Radiographic feature of COVID-19 effected Figure 4 Chest X-ray of COVID-19 infected patient at severe
chest X-ray stage (Kong and Agarwal, 2020; Ng et al., 2020;
Radiology Assistant, n.d.)
Through analysis of chest X-ray of COVID-19 infected
patients, it is found that at early stage, symptoms are mild
and there is small ground glass opacity and nodules in
lungs. This is demonstrated in Figure 1.
accuracy. Khan et al. (2020) designed a model to classify the 3.1 Statistical and edge-based
COVID-19, pneumonia-viral, pneumonia-bacterial and normal image features
patient through the chest X-ray images. Model was designed
Edge detection is used to find object boundaries in an image. It
using transfer-learning method under deep convolutional neural
is the first step in object recognition. It works by finding abrupt
network using publically available chest X-ray images. Authors
changes in intensity. In this work, statistical features from
have claimed to achieve 89.6% overall accuracy. Bassi and
image histogram and edge-based features are computed as
Attux (2020) proposed chest X-ray classification model for
handcraft features for classification. Statistical features consist
detection of COVID-19. They have developed model using
of mean, standard deviation, skewness and kurtosis (Tiwari,
DenseNet121 CNN using chest X-ray images of COVID-19
2020, 2017). These features can be defined as:
patients, pneumonia patients and normal person. The model
Let z be a random variable representing gray levels and
was trained twice, first through ImageNet and second through
let
chest X-ray database. Through simulation study, authors have
claimed to achieve 97.8% test accuracy for COVID-19 class. In p zi , i 0,1, 2, ,, L –1 , be the corresponding histogram,
this work, authors have proposed a hybrid features-based where L is the number of distinct grey levels.
classification model to acquire more detailed information from Mean: It measures the average intensity of the image.
X-ray data. Detailed information fetched from X-ray images
L 1
has been used in the classification task. Main contributions of
m zi . p zi (1)
proposed work are as follows: i 0
1) Design a hybrid feature set by combining statistical Standard deviation: It measures the contrast of the grey
features and edge-based features computed from image.
radiography images.
L 1
2) Design the multiclass classification models using hybrid
features set. For multiclass classification, three classes
( zi 0
i m ) 2 p zi (2)
(a) (b)
366 A. Jain et al.
After computing the statistical features, edge maps are image shown in Figure 5 is shown in Figures 6 (a), 6(b), 6(c),
computed using canny edge operators, Laplacian edge 6(d), 6(e), 6(f), 6(g), 6(h), 6(i) All these edge maps and
operator, Sobel edge operator in x-direction and y-direction, statistical features are arranged into a feature vector for
Prewitt edge operator in x-direction and y-direction and binary classification purpose. Methodology of Proposed work plan is
edge map using thresholding. Effect of these operators on as follows:
Algorithm: Statistical_Feature_Extraction()
Input: Grey Scale Image g(x, y)
Output: Feature Vector SF
Step 1: Compute the histogram h i of g x, y , where i denotes the grey levels.
Step 2: Calculate the mean of h i
Step 3: Calculate the standard deviation of h i
Step 4: Calculate the Kurtosis of h i
Step 5: Calculate the skewness of h i
Step 6: Concatenate all the features obtained in steps 2–5 to form the statistical feature vector SF .
Algorithm: Edge_Feature_Extraction()
Input: Grey Scale Image g(x, y)
Output: Edge Vector EF
Step 1: Compute Gradient in x-direction using Prewitt operator.
Step 2: Compute Gradient in y-direction using Prewitt operator.
Step 3: Compute Gradient in x-direction using Sobel operator.
Step 4: Compute Gradient in y-direction using Sobel operator.
Step 5: Compute Gradient of g x, y using Canny edge operator.
Step 6: Compute Gradient of g x, y using Laplacian edge operator.
Step 7: Compute edge map of g x, y using binary thresholding.
Step 8: Concatenate all the features obtained in steps 1–7to form the edge feature vector EF .
Gradient and statistical features-based prediction system for COVID-19 367
Figure 6 An X-ray image and its edge maps using different gradient operators (a) Original image, (b) Edge map using Laplacian gradient
operator, (c) Edge map using Canny edge operator, (d) Edge map using Sobel gradient operator in x-direction, (e) Edge map
using Sobel gradient operator in y-direction, (f) Edge map using Sobel gradient operator in x- and y-direction, (g) binary
threshold image, (h) Edge map using Prewitt gradient operator in x-direction, (i) Edge map using Prewitt gradient operator in
y-direction
3.2 Image data set in Figure 7. All the images are resized to 128 128 and
The COVID-19 X-ray image data set used in this work is converted to grey scale before feature extraction.
collected by following web Italian Society of Medical
and Interventional Radiology, Radiological Society of 3.3 Simulation environment details
North America and Radiopaedia (Dadario, 2020). This data
includes 2,191,345 and 1341 images with confirmed Four different classifiers are designed using feature map as
COVID-19’s pneumonia, common bacterial pneumonia and discussed in methodology section. The details of each
normal X-ray, respectively. The data set is divided into experiment is provided in the following sub-Sections. To
training and test subsets in the ratio of 70% and 30% i.e. measure the performance of each classification model,
number of images in each subset are 2033 and 872, precision, recall (sensitivity), F-score are measured from
respectively. The sample images from each class are shown confusion matrix.
368 A. Jain et al.
Figure 7 Sample chest X-ray images (a) Chest X-ray of COVID- classification. Concept of information gain is useful while
19 patient, (b) Chest X-ray normal person and (c) Chest selecting a feature to split. Information gain is measure of level
X-ray of viral pneumonia patient
up to which we can remove uncertainty. Its value lies between
[0, 1]. We use term entropy to calculate information gain.
Entropy is measure of uncertainty. It is expressed as follows:
n
E S pi log 2 pi (5)
i 1
Sv
IG S , A E S
v
values A S
E Sv (6)
of the problem.
Z W X b
i iT iT
(7)
Bias factor while using decision tree classifier is prefer smaller
tree i.e. either depth should be low or number of nodes should Here, X represents the input to the unit, W is the weight
be less. When to stop and which attribute to split are some of function for i-th layer, b is the bias function for i-th layer, Z
the factors, which can affect the performance of decision tree is the output of the i-th layer, is non-linear activation
Gradient and statistical features-based prediction system for COVID-19 369
function e.g. sigmoid function; for different layers can 4 Result and discussion
be same or it may be different (Jain et al., 2019).
While experimenting, authors have used MLP model 4.1 Results
which consists of three dense layers with 128 neurons in first Figures 8(a), 8(b), 8(c) and 8(d) show the confusion matrix
two layers and three neurons in last dense layer. The first and for random forest, decision tree, K-nearest neighbour and
second dense layers are followed by the ReLU activation multi-layer perceptron classification model for the test data.
layer while the last dense layer has used softmax activation Confusion matrix is used to show the performance of
layer. For regularisation purpose a dropout layer is used after classifier on test data diagrammatically.
first and second dense layers with 0.15 dropout rate. The Tables 1 and 2 shows the precision, sensitivity and F-score
model is optimised using Root Mean Square prop (RMSprop) performance metrics calculated from the confusion matrix
and categorical cross entropy loss function (Jain et al., 2018). shown in Figure 8 for random forest, decision tree, K-nearest
The model is trained for 50 epochs with batch size 32. neighbour and multilayer perceptron, respectively.
Figure 8 Confusion matrix representing true and predicted labels separately for each classification model, (a) Confusion matrix for
random forest classification model, (b) Confusion matrix for decision tree classification model, (c) Confusion matrix for
k-nearest neighbour algorithm classification model, (d) Confusion matrix for multilayer perceptron classification model
(a) (b)
(c) (d)
370 A. Jain et al.
Table 1 Results in terms of performance metrics for using random forest and decision tree classification models
Table 2 Results in terms of performance metrics using K-nearest neighbour and multilayer perceptron classification models
4.2 Discussion and analysis Sensitivity or Recall represents the percentage of actual
positive cases, which was predicted as positive. It represents
Precision is measure of consistency. It represents the ability of the ability of classifier to recognise all relevant cases correctly.
classifier to return only relevant cases. It represents how Mathematically, it could be represented by following formula:
many percentage of positive value is actually correct.
Mathematically, it could be represented by following formula: True Positive True Positive
R or (9)
True Positive True Positive True Positive False Negative Predicted Results
P or (8)
True Positive False Positive Actual Results Figure 10 shows the comparison of different classifiers on
the scale of recall for different classes. It has found that
So if value of P is x% then this imply that if classifier is
multilayer perceptron classifier is showing the best ability to
predicting about positive value then it will be correct x% of times.
find all classes correctly.
Figure 9 shows the comparison of different classifiers on
the scale of precision for different classes. It has found that
Figure 10 Recall analysis for different classifier
random forest and multilayer perceptron classifier are
showing good ability in finding relevant class.
precision and sensitivity, both represents model accuracy. Figure 12 shows the comparison of different classifiers on the
Which parameter between precision and recall should be scale of accuracy. This concludes that multilayer perceptron
maximised depends upon the problem. However, there is classifier has shown the best accuracy relative to other
another metric through which we can consider both classifier. Receiver Operating Characteristic (ROC) curve is
precision and recall at the same time, it is called F1 score. another way to analyse the performance of different classifiers.
Now instead of making a balance between two metrics, aim It depicts how precision vs. recall relationship changes with the
is to maximise single parameter F1 score. F1 score is change of threshold value for identification of positive case.
harmonic mean of Precision and recall. This will help us to find right value of precision and recall. By
calculating the area under the ROC curve, we can judge the
P*R
F1 score 2* performance of classifier. This area lies between 0 and 1. The
PR closer to 1 the better will be the classifier. To further strengthen
Figure 11 shows the comparison of different classifiers on and validate the results, receiver operating characteristic curve
the scale of F1 for different classes. It has found that is plotted for all four models, respectively. This is shown in
multilayer perceptron classifier is showing the best ability to Figures 13(a), 13(b), 13(c) and 13(d) corresponding to random
make a balance between precision and recall. forest, decision tree, k-nearest neighbour and multilayer
perceptron model. The high Area under the ROC Curve (AUC)
Figure 11 F1 analysis for different classifier score not only depicts the strength of the multilayer perceptron
model but also validates the results shown in Tables 1 and 2,
respectively.
(a) (b)
372 A. Jain et al.
Figure 13 Receiver operating characteristic curves separately for each classification model (a) ROC curve for random forest classification
model, (b) ROC curve for decision tree classification model, (c) ROC curve for k-nearest neighbour algorithm classification
model, (d) ROC curve for multi-layer perceptron classification model (continued)
(c) (d)
5 Conclusions Huang, C., Wang, Y., Li, X., Ren, L., Zhao, J., Hu, Y. and Cheng, Z.
(2020) ‘Clinical features of patients infected with 2019 novel
coronavirus in Wuhan, China’, The Lancet, pp.497–506.
COVID-19 pandemic effect will continue until vaccination is
available. Owing to the insufficient healthcare facilities in Jain A., Tiwari S. and Sapra V. (2018) Hands on Deep Learning
with Python Programming, Lambert Academic Publishing
highly populated developing countries like Brazil, India, etc., House. Doi: 10.18280/ria.340413.
situation is becoming worse day by day. Therefore, in current
Jain, A., Tiwari, S. and Sapra, V. (2019) ‘Two-phase heart disease
scenario, medical practitioners are trying to use available diagnosis system using deep learning’, International Journal
infrastructure of radiography system for detecting COVID-19. of Control and Automation, Vol. 12, No. 5, pp.558–573.
Work proposed in this paper is in the direction to help Khan, A.I., Shah, J.L. and Bhat, M.M. (2020) ‘Coronet: a deep
healthcare workers. Functions of image processing has neural network for detection and diagnosis of COVID-19
used to find statistical features and edge-based features for from chest X-ray images’, Computer Methods and Programs
online available chest X-ray images of COVID-19 patients, in Biomedicine. Doi: 10.1016/j.cmpb.2020.105581.
pneumonia patients and normal persons. 70% of this data has Kong, W. and Agarwal, P.P. (2020) ‘Chest imaging appearance of
used for training of decision tree, random forest, k-nearest COVID-19 infection’, Radiology: Cardiothoracic Imaging,
neighbour and multilayer perceptron classifiers. Remaining Vol. 2, No. 1. Doi: 10.1148/ryct.2020200028.
30% was used for testing of the classifier models. It has found Kotsiantis, S.B., Zaharakis, I. and Pintelas, P. (2007) ‘Supervised
machine learning: a review of classification techniques’,
that multilayer perceptron classifier has given 94% accuracy to
Emerging Artificial Intelligence Applications in Computer
detect COVID-19 patient from chest X-ray. Engineering, Vol. 160, pp.3–24.
To continue their work of helping healthcare professionals, Letko, M.C. and Munster, V. (2020) ‘Functional assessment of cell
authors have started the work in the direction of finding entry and receptor usage for lineage B β-coronaviruses,
COVID-19 from CT-scan images of chest. including 2019-nCoV’, BioRxiv. Doi: 2020.01.22.915660.
Li, Q., Guan, X., Wu, P., Wang, X., Zhou, L., Tong, Y. and Xing,
X. (2020) ‘Early transmission dynamics in Wuhan, China, of
References novel coronavirus-infected pneumonia’, New England
Journal of Medicine. Doi: 10.1056/NEJMoa2001316.
Apostolopoulos, I.D. and Mpesiana, T.A. (2020) ‘COVID-19:
Ng, M.Y., Lee, E.Y., Yang, J., Yang, F., Li, X., Wang, H. and
automatic detection from X-ray images utilizing transfer
Hui, C.K.M. (2020) ‘Imaging profile of the COVID-19
learning with convolutional neural networks’, Physical and
infection: radiologic findings and literature review’,
Engineering Sciences in Medicine, Vol. 43, pp.635–640.
Radiology: Cardiothoracic Imaging, Vol. 2, No. 1. Doi:
Bassi, P.R. and Attux, R. (2020) ‘A deep convolutional neural 10.1148/ryct.2020200034.
network for COVID-19 detection using chest X-rays’, arXiv Noi, P.T. and Kappas, M. (2018) ‘Comparison of random forest, k-
preprint arXiv:2005.01578. nearest neighbor, and support vector machine classifiers for
Dadario, A.M.V. (2020) COVID-19 X rays, Kaggle. Doi: land cover classification using Sentinel-2 imagery’, Sensors,
10.34740/KAGGLE/DSV/1019469. Vol. 18, No. 1, pp.1–20.
Ghoshal, B. and Tucker, A. (2020) ‘Estimating uncertainty and Radiology Assistant (n.d.) Radiology Assistant. Available online
interpretability in deep learning for coronavirus (COVID-19) at: https://radiologyassistant.nl/chest/lk-jg-1#chest-radiograph
detection’, arXiv preprint arXiv:2003.10769. Rajaraman, S., Siegelman, J., Alderson, P.O., Folio, L.S., Folio,
Hall, L.O., Paul, R., Goldgof, D.B. and Goldgof, G.M. (2020) L.R. and Antani, S.K. (2020) ‘Iteratively pruned deep
‘Finding COVID-19 from chest X-rays using deep learning learning ensembles for COVID-19 detection in chest X-rays’,
on a small dataset’, arXiv preprint arXiv:2004.02060. arXiv preprint arXiv:2004.08379.
Gradient and statistical features-based prediction system for COVID-19 373
Song, Z., Xu, Y., Bao, L., Zhang, L., Yu, P., Qu, Y. and Van Doremalen, N., Bushmaker, T., Morris, D.H., Holbrook, M.G.,
Qin, C. (2019) ‘From SARS to MERS, thrusting Gamble, A., Williamson, B.N. and Lloyd-Smith, J.O. (2020)
coronaviruses into the spotlight’, Viruses, Vol. 11, No. 1, ‘Aerosol and surface stability of SARS-CoV-2 as compared with
pp.1–29. SARS-CoV-1’, New England Journal of Medicine, Vol. 382,
Tiwari, S. (2017) ‘A pattern classification based approach No. 16, pp.1564–1567.
for blur classification’, Indonesian Journal of Electrical WORLDOMETERS (n.d.) Reported Cases and Deaths by Country
Engineering and Informatics (IJEEI), Vol. 5, No. 2, or Territory. Worldometer Coronavirus Population.
pp.162–173. https://www.worldometers.info/coronavirus/?utm_campaign=
Tiwari, S. (2020) ‘A comparative study of deep learning models homeAdvegas1?%20
with handcraft features and non-handcraft features for Wu, J.T., Leung, K. and Leung, G.M. (2020) ‘Nowcasting and
automatic plant species identification’, International Journal forecasting the potential domestic and international spread
of Agricultural and Environmental Information Systems of the 2019-nCoV outbreak originating in Wuhan, China:
(IJAEIS), Vol. 11, No. 2, pp.44–57. a modelling study’, The Lancet, pp.689–697.