Professional Documents
Culture Documents
a r t i c l e i n f o a b s t r a c t
Article history: Various viral epidemics have been detected such as the severe acute respiratory syndrome coronavirus
Received 13 October 2020 and the Middle East respiratory syndrome coronavirus in the last two decades. The coronavirus disease
Revised 18 December 2020 2019 (COVID-19) is a pandemic caused by a novel betacoronavirus called severe acute respiratory syn-
Accepted 30 December 2020
drome coronavirus-2 (SARS-CoV-2). After the rapid spread of COVID-19, many researchers have investi-
Available online 9 January 2021
gated diagnosis and treatment for this terrifying disease quickly. Identifying COVID-19 from the other
types of coronaviruses is a difficult problem due to their genetic similarity. In this study, we propose a
Keywords:
new efficient COVID-19 detection method based on the K-nearest neighbors (KNN) classifier using the
COVID-19
SARS-CoV-2
complete genome sequences of human coronaviruses in the dataset recorded in 2019 Novel
K-Nearest Neighbors Coronavirus Resource. We also describe two features based on CpG island that efficiently detect
CpG islands COVID-19 cases. Thus, genome sequences including approximately 30,000 nucleotides can be repre-
Human coronaviruses sented by only two real numbers. The KNN method is a simple and effective non-parametric technique
for solving classification problems. However, performance of the KNN depends on the distance measure
used. We perform 19 distance metrics investigated in five categories to improve the performance of the
KNN algorithm. Some efficient performance parameters are computed to evaluate the proposed method.
The proposed method achieves 98.4% precision, 99.2% recall, 98.8% F-measure, and 98.4% accuracy in a
few seconds when any L1 type metric is used as a distance measure in the KNN.
Ó 2020 Karabuk University. Publishing services by Elsevier B.V. This is an open access article under the CC
BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
https://doi.org/10.1016/j.jestch.2020.12.026
2215-0986/Ó 2020 Karabuk University. Publishing services by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
H. Arslan and H. Arslan Engineering Science and Technology, an International Journal 24 (2021) 839–847
The most common symptoms of COVID-19 disease are cough, workload on health-workers. Chen et al. [22] published a compre-
fever, gastrointestinal and musculoskeletal symptoms as well as hensive survey about studies performed by using artificial intelli-
loss of taste or smell. Shortness of breath and chest pressure or gence in the literature on the fight against COVID-19. In addition,
pain are among less common symptoms [7]. Because these com- they observed that machine learning, deep learning and artificial
mon symptoms are similar to the common flu, it is difficult to con- neural networks technologies have been successfully implemented
duct an early diagnosis of SARS-CoV-2. It is essential to quickly at almost every stage of combating COVID-19 when compared to
identify positive cases since it spreads rapidly and poses a threat other coronaviruses such as SARS-CoV and MERS-CoV.
to the public health system. Furthermore, there is still no specific Randhawa et al. [23] suggested a combination of supervised
antiviral drug recommended for the treatment of the novel coron- machine learning with digital signal processing methods (ML-
avirus disease other than supportive care so far. In the period of DSP) for an accurate and scalable taxonomic classification of geno-
pre-clinical researches, some medications already prescribed for mic sequences. They matched each genomic sequence to discrete
other diseases have shown positive effects against COVID-19 virus. values corresponding to its genomic signals. They employed six
Due to the absence of an effective COVID-19 treatment, early supervised machine learning classifiers (linear discriminant, linear
detection of this life-threatening infectious disease plays a very support vector machine, quadratic support vector machine, fine
important role in all medical therapies and in the prevention of KNN, subspace discriminant, and subspace KNN) to detect SARS-
the spread of the disease. Recently, many data scientists have been CoV-2. They used 29 SARS-CoV-2 genome sequences and 20 gen-
working extensively on remarkable features of the virus. For this ome sequences for each of alphacoronavirus, betacoronavirus,
purpose, artificial intelligence applications and machine learning and delta-coronavirus. They observed that the linear discriminant
methods have also been used successfully to speed up the diagno- method achieved 100% accuracy. Naeem et al. [24] proposed a
sis process of the COVID-19 cases [8–10]. Therefore, the learning method to distinguish COVID-19, SARS-CoV, and MERS-CoV viruses
based classification models will not only reduce the burden on by using the K-nearest neighbors and the trainable cascade-
healthcare professionals but will also facilitate early diagnosis of forward back-propagation neural network methods. They
the disease. extracted genomic signal processing features using a dataset that
In this paper, we introduce an effective classification method to contains 76 genome sequences for each type of coronavirus from
distinguish SARS-CoV-2 from common human coronaviruses, the National Center for Biotechnology Information (NCBI). Their
which are AlphaCoV, BetaCoV-1, MERS-CoV, HKU1-CoV, NL63- results showed that performance of the KNN algorithm was higher
CoV, and 229E-CoV using CpG island features and KNN classifica- than the cascade-forward back-propagation neural network in all
tion method. Main contributions of this research can be summa- COVID-19/SARS-CoV, COVID-19/MERS-CoV and COVID-19/SARS-
rized as follows: CoV/MERS-CoV classification processes, and achieved an accuracy
of 100%. Batista et al. [25] developed a new method to predict
The choice of the differentiable features based on characteristic whether patients in the emergency care unit are COVID-19 positive
of SARS-CoV-2 is a critical step to improve classification perfor- or not by using five supervised machine learning algorithms, which
mance. In this study, we propose CpG island features to differ- are neural networks, random forests, logistic regression, support
entiate SARS-CoV-2 from the other human coronaviruses. vector machines, and gradient boosting regression trees. They col-
We propose a robust prediction method by using KNN classifier lected data from 235 patients, and they used 15 different types of
with any L1 type distance metric selected from among 19 dis- features such as age, gender, hemoglobin etc. Their experimental
tance metrics investigated in five categories, which are L1 type, results showed that the support vector machine classifier achieved
L2 type, vicissitude, inner product metrics, and the other types the highest performance with AUC value of 84.7% among the other
of metrics. classifiers.
We construct a larger dataset containing almost all types of Unal and Dudak [26] studied on diagnosis of COVID-19 viral dis-
these human coronaviruses which are genetically similar to ease. They applied Naive Bayes, KNN, support vector machines and
SARS-CoV-2. They are AlphaCoV, BetaCoV_1, MERS-CoV, NL63- decision tree algorithms to the dataset named as COVID-19 Mexico
CoV, HKU1-CoV, and 229E-CoV. Patient Health Dataset. The dataset consists of 95839 cases
The proposed method efficiently detects COVID-19 cases in a recorded by Mexican government, and 19 different types of fea-
few seconds on the relatively large dataset. tures like as the sex of the patient, age of the patient, the state of
pneumonia and intubation as well as the state of many other dis-
The rest of the paper is organized as follows. In Section 2, a lit- eases. They performed four types of supervised machine learning
erature survey on COVID-19 is presented. In Section 3, the K- algorithms. Their experimental results showed that support vector
nearest neighbors (KNN) method and distance measures used in machine achieved the best predictive performance with the classi-
the KNN method are summarized. The proposed classification fication accuracy of 100%.
strategy to detect COVID-19 cases is introduced in Section 4. Sec- Although there are only few studies detecting COVID-19 cases
tion 5 reports experimental results and evaluation of the experi- from genome sequences, a number of papers have been published
mental observations conducted by various distance metrics. In for detecting COVID-19 cases from X-ray or computed tomography
Section 6, we compare the proposed method with the other detec- (CT) images [27–34], recently. Barstugan et al. [27] developed a
tion methods. Finally, Section 7 presents concluding remarks and COVID-19 classification method by using the support vector
future directions. machine classifier on a dataset including different types of 150
abdominal CT images. Their experimental results showed that their
proposed method achieved an accuracy of 99.6%. Ozturk et al. [28]
2. Related works applied the DarkCovidNet deep learning model to raw chest X-ray
images and generated a binary classification (COVID-19, no find-
Deep learning and machine learning techniques have been ings). They also used the DarkCovidNet classifier method on the
widely used in a variety of research areas such as big data analysis same dataset to create a triple classification (COVID-19, no-
[11–13], image classification [14], face detection [15,16], and dis- findings, pneumonia). They declared that the highest accuracy of
ease prediction [17–21]. The computer based diagnostic systems the classifier was 98.08% for the binary classification and 87.02%
developed by the help of artificial intelligence techniques will for the triple case. Sekeroglu et al. [29] proposed an alternative
speed up early diagnosis of COVID-19 and thus will decrease the COVID-19 detection method by using deep learning and machine
840
H. Arslan and H. Arslan Engineering Science and Technology, an International Journal 24 (2021) 839–847
learning classifiers on a publicly available dataset which contains k by introducing the generalized mean distance-based KNN classi-
1583 healthy, 4292 pneumonia and 225 confirmed COVID-19 chest fier. They stated that the proposed method is less sensitive to k
X-ray images. They stated that a convolutional neural network over the KNN-based classifier.
(CNN) without pre-processing and with minimized layers achieved
an accuracy of 98.50%. Jain et al. [30] developed a new approach to 3.2. Metrics
detect COVID-19 cases among chest X-ray images of healthy, bac-
terial pneumonia, viral pneumonia and COVID-19 by using To improve the performance of the KNN classifier, we investi-
ResNet18, ResNet101, DenceNet121 and VGG-16 deep learning gate the metrics used in the KNN algorithm under five categories
models. Their experimental results demonstrated that ResNet101 by following the similar categorization in [39,40]. We list them
method achieved the highest accuracy of 98.93%. Apostolopoulos as below:
and Bessiana [31] used VGG19, Mobile Net, Inception, Xception
and Inception ResNet v2 deep transfer learning classifiers to detect 1. L1 type metrics: Six metrics are investigated in this category.
COVID-19 positive cases among 224 COVID-19, 700 bacterial pneu- They are Manhattan metric (ManM), Chebyshev metric
monia and 504 normal chest X-ray images. Their results presented (ChebM), Canberra metric (CanM), Sorensen metric (SM),
that VGG19 achieved the best binary classification accuracy of Kulezynski metric (KM), and Mean character metric (MCM),
98.75% over the other CNN methods. which are explained in 1.
Ahuja et al. [32] used deep learning methods to detect COVID- 2. L2 type metrics: Seven metrics are investigated in this category.
19 positive cases from chest X-ray images. Asnaoui and Chawki They are Euclidean metric (EM), Clark metric (ClaM), Neyman
[33] constructed a novel method for detecting COVID-19 from v2 metric (NCSM), Squared v2 metric (SquM), Divergence metric
pneumonia chest X-ray and tomography images. They performed (DivM), and Squared Chi-squared metric (SCSM). We list these
recent deep learning methods, VGG16, VGG19, DenseNet201, metrics in 2.
Inception_ResNet_V2, Inception_V3, Resnet50, and MobileNet_V2. 3. Vicissitude metrics: Three metrics are considered in this cate-
They reported that the performance of Inception_Resnet_V2 gory. They are Vicis Symmetric 1 metric (VSDFM1), Vicis Sym-
demonstrated the best accuracy of 92.18%. Basu et al. [34] pro- metric 2 metric (VSDFM2) and Vicis Symmetric 3 metric
posed an alternative screening method of COVID-19, which is (VSDFM3). Definitions of the mentioned metrics are given in 3.
called Domain Extension Transfer Learning. They extracted some 4. Inner product metrics: Two metrics are investigated in this
discriminate features from the chest X-ray dataset. They then clas- category. They are Dice metric (DicM) and Chord metric
sified the images as normal, pneumonia, other diseases, and (ChoM). Definition of the mentioned metrics are demonstrated
COVID-19 with an accuracy of 95.3%. in 4.
5. Other metrics: In this category, we perform two metrics that
significantly affect the performance of the KNN algorithm.
3. Background
These metrics are Motyka (MotM) and Hassanat (HasM) met-
rics. The description of them is shown in 5, where
We propose a new COVID-19 detection method based on CpG ( 1þminðxi ;yi Þ
island features and the KNN classifier. The KNN is extremely useful 1 1þmaxðx i ;yi Þ
; if minðxi ; yi Þ P 0
Dðxi ; yi Þ ¼ 1þminðxi ;yi Þþjminðxi ;yi Þj .
for large data classification and its performance depends on the 1 1þmaxðx ;y Þþjminðx ;y Þj
; if minðxi ; yi Þ < 0
i i i i
distance metric used [35–38]. Several studies have been conducted
to detect optimum metrics for KNN algorithm in [39,40]. In this 4. Proposed method
section, first, we provide a brief description of the KNN method
and some recent studies based on the KNN. Second, we investigate In this section, we present a new COVID-19 detection method
the metrics used in the KNN classifier under five categories to based on the KNN classifier and CpG island features. Main steps
improve the performance of the KNN classifier on our model. of the proposed method are described in 1. The first step of the
algorithm is feature extraction. Extracting robust and discrimina-
3.1. K-nearest neighbors tive features from human coronaviruses genome sequences is the
most critical step to improve diagnosis of SARS-CoV-2. In this step,
K-nearest neighbors (KNN), a supervised machine learning algo- we propose to use CpG based features since CpG dinucleotides in
rithm, can be efficiently used to solve classification problems. The the open reading frames of SARS-CoV-2 has extremely low-
KNN was introduced in 1951 by [41] and then recasted in 1967 by frequency [46]. The main reason of lower CpG dinucleotides den-
[42]. The KNN is a non-parametric classifier known as one of the sity is the mutation of C into A and G into T [47,48]. The proposed
simplest and laziest algorithms. That is, there is no need to create features are extracted by using Eq. 1 and Eq. 2.
a learning model in this classification method. Despite the lazy CGp ¼ ratioðCÞ þ ratioðGÞ ð1Þ
structure of KNN, it was proposed as one of the 10 most effective
methods in the process of analyzing information in a database
given [43]. In the prediction process, the class to which a new
observation data belongs is determined by calculating the shortest Table 1
distance between the observation sample and its K-nearest neigh- L1 type metrics.
bors samples.
Abbreviation Metric Definition
There are some recent studies to improve the performance of Pn
i¼1 jxi yi j
ManM Manhattan metric
the KNN. To decrease the sensitivity of the neighborhood size of
ChebM Chebyshev metric max jxi yi j
k and improve voting strategy in the region of neighborhoods, 16i6n
CanM Canberra metric Pn jxi yi j
Gou et al. [44] proposed two k-nearest neighbor rules, which are i¼1 jx jþjy j
Pn i i
KM Kulezynski metric jx y j
rule. Their experimental results demonstrate that the proposed Pn i¼1 i i
minðxi ;yi Þ
methods have a lower sensitiveness to k. Gou et al. [45] proposed Pn
i¼1
841
H. Arslan and H. Arslan Engineering Science and Technology, an International Journal 24 (2021) 839–847
5. Experimental results
Table 4
Inner product metrics. In this section, we evaluate performance of the proposed
Abbreviation Metric Definition method. Before discussing the results, first we explain our dataset,
Pn second we mention experimental setup, and then we provide a
DicM Dice metric 2 xi y
1 Pn 2 i¼1Pni 2 brief information about the performance measures. Finally, we will
x þ y
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
i¼1 i
P
i¼1 i
5.1. Dataset
Table 5
Other metrics: Motyka and Hassanat metrics. The 2019 novel coronavirus resource (2019nCoVR) [49] at China
National Center for Bioinformation is one of the most important
Abbreviation Metric Definition
Pn sources of various types of coronaviruses. It integrates various
MotM Motyka metric maxðxi ;yi Þ
Pi¼1n important databases including the GISAID, NCBI, NMDC and
ðx þyi Þ
i¼1 i
Pn CNCB/NGDC. In this study, we used complete genomic sequences
i¼1 Dðxi ; yi Þ
HasM Hassanat metric
of human coronaviruses obtained from 2019nCoVR. Genome
sequence of each coronavirus has approximately a length of 30
000 nucleotides. The properties of human coronavirus sequences
ratioðCGÞ
CpGo ¼ ð2Þ are presented in 6. The dataset includes various types of human
ratioðCÞratioðGÞ
coronaviruses such as AlphaCoV, BetaCov-1, MERS-CoV, NL63-
CoV, HKU1-CoV and 229E-CoV as well as SARS-COV-2. We refer
where ratioðCÞ; ratioðGÞ, and ratioðCGÞ are computed as divided the SARS-CoV-2 sequences as COVID-19 positive, and the sequences
number of occurrences of C; G, and CG in the sequence, respectively that do not include SARS-CoV-2 are referred as COVID-19 negative.
by the sequence length. Thus, each sequence containing 30 000 In addition to 1000 SARS-CoV-2 sequences, we used 592 genome
nucleotides is represented by two features only. 1 provides an sequences of other human coronaviruses in our experiments. We
example of how features are calculated from a part of the sequence. note that all available genome sequences of human coronaviruses,
After feature extraction step, we apply the KNN method to clas- which are different from SARS-CoV-2 are downloaded.
sify SARS-CoV-2 sequences. The performance of the KNN mainly
depends on the metric that is used to compute the distances 5.2. Experimental setup
between different data samples. To improve the performance of
the KNN algorithm, we perform 19 distance metrics investigated The experiments were performed using a core i7, 2.7 GHz pro-
in five categories, which are L1 type, L2 type, vicissitude, inner pro- cessor, 16 GB RAM under Linux operating system. Feature extrac-
duct metrics, and the other types of metrics. In this step, we pro-
pose to use L1 type metric as a distance measure in the KNN.
Table 6
The properties of complete genome sequences of human coronaviruses.
842
H. Arslan and H. Arslan Engineering Science and Technology, an International Journal 24 (2021) 839–847
843
H. Arslan and H. Arslan Engineering Science and Technology, an International Journal 24 (2021) 839–847
Table 7
Average of the precision, recall, and F-measure values.
844
H. Arslan and H. Arslan Engineering Science and Technology, an International Journal 24 (2021) 839–847
Table 8
Comparison of existing state-of-the-art classification studies.
Naeem et al. [24] Discrete Fourier Transform with KNN 76 COVID-19 sequences COVID-19 100
Discrete Cosine transform with KNN 76 SARS-CoV sequences vs SARS-CoV
Seven Moment Invariants with KNN 76 MERS-CoV sequences vs MERS-CoV
Proposed Method KNN with L1 type metrics 1000 COVID-19 sequences COVID-19 vs 98.4
CpG based features 592 other coronavirus sequences other type coronaviruses
Unal and Dudak [26] Naive Bayes 95839 case with COVID-19 vs 100
KNN 19 type of laboratory findings other diseases
Support Vector Machines
Decision Tree Algorithm
Barstugan et al. [27] Support Vector Machine 150 Abdominal CT images COVID-19 vs 99.6
other viral pneumonias
Apostolopoulos and Bessiana [31] Deep Transfer Learning 224 COVID-19 Chest X-ray images COVID-19 vs 98.75
700 images of bacterial pneumonia Pneumonia vs
504 images of normal conditions Normal
Ozturk et al. [28] DarkCovidNet raw chest X-ray images COVID-19 vs 87.02
healthy vs
pneumonia
845
H. Arslan and H. Arslan Engineering Science and Technology, an International Journal 24 (2021) 839–847
Table 8 (continued)
Basu et al. [34] Domain Extension Transfer Learning chest X-ray images COVID-19 vs 95.3
normal vs
pneumonia vs
other diseases
extraction method is expensive. Another disadvantage of the step, we propose to use CpG island features. Each genome
model established by Naeem et al. [24] is that they work with a sequence of human coronavirus, which includes about 30,000
smaller dataset just like Randhawa. nucleotides is represented by two real numbers only. Feature
Next we discuss advantages/drawbacks of the methods using extraction step takes just a few seconds. Second, the KNN method
the other datasets. Unal and Dudak [26] used Mexico Patient is used for the classification of COVID-19 positive cases from the
Health Dataset to detect COVID-19 cases. Although the size of their other types of human coronaviruses, AlphaCoV, Beta-Cov-1,
dataset is large, their dataset only represents the features of a MERS-CoV, NL63-CoV, HKU1-CoV, and 229E-CoV. The KNN classi-
specific region. Furthermore, they used 19 different features fier is the simplest method and has high flexibility for solving com-
including the sex of the patient, age of the patient, the state of plex classification problems. The accuracy of the KNN is higher
pneumonia and intubation as well as the state of many other dis- than state-of the-art classifiers in certain cases, and it often pro-
eases. One of the advantages of our model over the method of Unal duces efficient performance. However, the performance of the
and Dudak [26] is to use powerful and effective two features KNN greatly depends on the metric performed. To detect the most
derived from the complete genome sequences of human coron- appropriate metric, we review five groups of metrics used in the
aviruses. Thus, our proposed method detects COVID-19 positive KNN classifier. The selection of different distance metrics for the
cases within a few seconds. However, Unal and Dudak [26] did KNN can result a variation in accuracy outcomes for the same data-
not provide any explanation about the detection time. Another dif- set. Experimental results reveal that the proposed method achieves
ferent aspects of the proposed method from their method is that the highest accuracy, which is 98.4% on average in a few seconds
they use the classicial KNN classifier with default parameters when L1 type metrics are used as a distance measure in the KNN.
although we use the KNN classifier with an optimum distance met- In future studies, we will compare human SARS-CoV-2 sequences
ric in our proposed model. to other types of coronavirus sequences such as bat SARS-CoV-
There are a number of the existing studies detecting COVID-19 like coronaviruses 2, and propose a similarity based feature to
cases from image dataset in the literature. In 8, we also exhibit six increase overall accuracy. In addition, the factors affecting the
image based studies achieving remarkable accuracy values. These recovery status of patients suffering from COVID-19 may be inves-
image based methods used different deep learning or machine tigated in future studies by combining machine learning and paral-
learning methods. Although image based studies have a higher lel computing methods with effective features.
accuracy, when considering a high mutation rate of SARS-CoV-2,
using genomic sequences is extremely beneficial when tracking Declaration of Competing Interest
coronavirus genes that change frequently as the disease spreads
from one person to another. Moreover, the radiation generated The authors declare that they have no known competing finan-
by X-ray or CT scanning machines may cause permanent damages cial interests or personal relationships that could have appeared
to people. For this reason, X-ray or CT scanning may not be to influence the work reported in this paper.
obtained for some people, which can be considered as a disadvan-
tage of image based studies.
References
846
H. Arslan and H. Arslan Engineering Science and Technology, an International Journal 24 (2021) 839–847
[7] L. Fu, B. Wang, T. Yuan, X. Chen, Y. Ao, T. Fitzpatrick, P. Li, Y. Zhou, Y. fan Lin, Q. medRxivdoi: 10.1101/2020.04.04.20052092. https://www.medrxiv.
Duan, G. Luo, S. Fan, Y. Lu, A. Feng, Y. Zhan, B. Liang, W. Cai, L. Zhang, X. Du, L. Li, org/content/early/2020/04/14/2020.04.04.20052092..
Y. Shu, H. Zou, Clinical characteristics of coronavirus disease 2019 (covid-19) in [26] Y. Ünal, M.N. Dudak, Classification of covid-19 dataset with some machine
china: A systematic review and meta-analysis, J. Infect. 80 (6) (2020) 656–665. learning methods (2020)..
doi: https://doi.org/10.1016/j.jinf.2020.03.041. URL http:// [27] M. Barstugan, U. Ozkaya, S. Ozturk, Coronavirus (covid-19) classification using
www.sciencedirect.com/science/article/pii/S0163445320301705.. ct images by machine learning methods. arxiv:2003.09424 (03 2020)..
[8] R. Vaishya, M. Javaid, I. Khan, A. Haleem, Artificial intelligence (ai) applications [28] T. Ozturk, M. Talo, E.A. Yildirim, U.B. Baloglu, O. Yildirim, U. Rajendra Acharya,
for covid-19 pandemic, Diab. Metab. Syndrome Clin. Res. Rev. 14. doi: 10.1016/ Automated detection of covid-19 cases using deep neural networks with X-ray
j.dsx.2020.04.012.. images, Comput. Biol. Med. 121 (2020), https://doi.org/10.1016/
[9] G.G. Waleed Salehi A, Baglat P, Review on machine and deep learning models j.compbiomed.2020.103792 103792.
for the detection and prediction of coronavirus, Mater. Today Proc. doi: [29] B. Sekeroglu, I. Ozsahin, Detection of covid-19 from chest x-ray images using
10.1016/j.matpr.2020.06.245.. convolutional neural networks, SLAS TECHNOL. Transl. Life Sci. Innov. 0 (0) (0)
[10] M.A. Dey L., S. Chakraborty, Machine learning techniques for sequence-based 2472630320958376, pMID: 32948098. arXiv:https://doi.org/10.1177/
prediction of viral–host interactions between sars-cov-2 and human proteins, 2472630320958376, doi: 10.1177/2472630320958376. doi: 10.1177/
Biomed. J. doi: 10.1016/j.bj.2020.08.003. URL https://doi.org/10.1016/j. 2472630320958376..
bj.2020.08.003.. [30] G. Jain, D. Mittal, D. Thakur, M.K. Mittal, A deep learning approach to detect
[11] P. Gupta, A. Sharma, R. Jindal, Scalable machine-learning algorithms for big covid-19 coronavirus with X-ray images, Biocybern. Biomed. Eng. 40 (4)
data analytics: a comprehensive review, WIREs Data Min. Knowl. Discovery 6 (2020) 1391–1405, https://doi.org/10.1016/j.bbe.2020.08.008.
(6) (2016) 194–214. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/ [31] I. Apostolopoulos, T. Bessiana, Covid-19: automatic detection from X-ray
widm.1194, doi: 10.1002/widm.1194. https://onlinelibrary.wiley.com/doi/ images utilizing transfer learning with convolutional neural networks, Phys.
abs/10.1002/widm.1194.. Eng. Sci. Med. 43 (2) (2020) 635–640, https://doi.org/10.1007/s13246-020-
[12] M.M. Najafabadi, F. Villanustre, T.M. Khoshgoftaar, N. Seliya, R. Wald, E. 00865-4.
Muharemagic, Deep learning applications and challenges in big data analytics, [32] S. Ahuja et al., Deep transfer learning-based automated detection of covid-19
J. Big Data 2 (1) (2015) 194–214, https://doi.org/10.1186/s40537-014-0007-7. from lung ct scan slices, Appl. Intell. (2020), https://doi.org/10.1007/s10489-
[13] T.T. Zin, J.C.W. Lin, Big data analysis and deep learning applications: 020-01826-w.
proceedings of the first international conference on big data analysis and [33] K.E. Asnaoui, Y. Chawki, Using X-ray images and deep learning for automated
deep learning, 2019. URL https://www.springer.com/gp/book/ detection of coronavirus disease, J. Biomol. Struct. Dyn. 0 (0) (2020) 1–12,
9789811308680. pMID: 32397844. arXiv:https://doi.org/10.1080/07391102.2020.1767212, doi:
[14] T. Das, Machine learning algorithms for image classification of hand digits and 10.1080/07391102.2020.1767212. doi: 10.1080/07391102.2020.1767212..
face recognition dataset, Int. Res. J. Eng. Technol. 4 (12) (2017) 640–649. [34] S. Basu, S. Mitra, N. Saha, Deep learning for screening covid-19 using chest X-
[15] M. Sharma, J. Anuradha, H. Manne, G. Kashyap, Facial detection using deep ray images (2020). arXiv:2004.10507..
learning, IOP Conf. Ser.: Mater. Sci. Eng. 263 (2017), https://doi.org/10.1088/ [35] J. Maillo, I. Triguero, F. Herrera, A mapreduce-based k-nearest neighbor
1757-899X/263/4/042092 042092. approach for big data classification, in: 2015 IEEE Trustcom/BigDataSE/ISPA,
[16] P. Viola, M. Jones, Robust real-time face detection, Int. J. Comput. Vision 57 vol. 2, 2015, pp. 167–172..
(2004) 137–154, https://doi.org/10.1023/B:VISI.0000013087.49260.fb. [36] Z. Deng, X. Zhu, D. Cheng, M. Zong, S. Zhang, Efficient knn classification
[17] K. Lewandowski, Y. Xu, S. Pullan, S. Lumley, D. Foster, N. Sanderson, A. algorithm for big data, Neurocomput. 195 (C) (2016) 143–148, https://doi.org/
Vaughan, M. Morgan, N. Bright, J. Kavanagh, R. Vipond, M. Carroll, A. Marriott, 10.1016/j.neucom.2015.08.112.
K. Gooch, M. Andersson, K. Jeffery, T. Peto, D. Crook, A. Walker, P. Matthews, [37] J. Maillo, S. Ramírez, I. Triguero, F. Herrera, knn-is: an iterative spark-based
Metagenomic nanopore sequencing of influenza virus direct from clinical design of the k-nearest neighbors classifier for big data, Knowl.-Based Syst.
respiratory samples, J. Clin. Microbiol. 58. doi: 10.1128/JCM.00963-19.. 117. doi: 10.1016/j.knosys.2016.06.012..
[18] L. Kafetzopoulou, K. Efthymiadis, K. Lewandowski, A. Crook, D. Carter, J. [38] F. Wang, Q. Wang, F. Nie, W. Yu, R. Wang, Efficient tree classifiers for large
Osborne, E. Aarons, R. Hewson, J. Hiscox, M. Carroll, R. Vipond, S. Pullan, scale datasets, Neurocomputing doi: 10.1016/j.neucom.2017.12.061..
Assessment of metagenomic nanopore and illumina sequencing for recovering [39] S.H. Cha, Comprehensive survey on distance/similarity measures between
whole genome sequences of chikungunya and dengue viruses directly from probability density functions, Int. J. Math. Models Methods Appl. Sci. 1 (4) (20)
clinical samples, Eurosurveillance 23. doi: 10.2807/1560-7917. 300–307..
ES.2018.23.50.1800228.. [40] H. Abu Alfeilat, A. Hassanat, O. Lasassmeh, A. Tarawneh, M. Alhasanat, H. Eyal-
[19] A. Khanday, S. Rabani, Q. Khan, et al., Machine learning based approaches for Salman, S. Prasath, Effects of distance measure choice on k-nearest neighbor
detecting covid-19 using clinical text data, Int. J. Inf. Technol. 12 (2020) 731– classifier performance: a review, Big Data 7. doi: 10.1089/big.2018.0175..
739, https://doi.org/10.1007/s41870-020-00495-9. [41] E. Fix, J.L.H. (1951), Discriminatory analysis. nonparametric discrimination:
[20] L. Li, L. Qin, Z. Xu, Y. Yin, X. Wang, B. Kong, J. Bai, Y. Lu, Z. Fang, Q. Song, K. Cao, consistency properties, Technical Report 4, USAF School of Aviation Medicine,
D. Liu, G. Wang, Q. Xu, X. Fang, S. Zhang, J. Xia, J. Xia, Using artificial Randolph Field, TX, USA. URL http://www.jstor.org/stable/1403797..
intelligence to detect covid-19 and community-acquired pneumonia based on [42] T. Cover, P. Hart, Nearest neighbor pattern classification, IEEE Trans. Inf.
pulmonary ct: evaluation of the diagnostic accuracy, Radiology 296 (2) (2020 Theory 13 (1)..
Aug) E65–E71, pMID: 32191588. doi: 10.1148/radiol.2020200905. doi: [43] X. Wu, V. Kumar, J. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G.J. McLachlan, A.F.
10.1007/s41870-020-00495-9.. M. Ng, B. Liu, P. Yu, Z. Zhou, M. Steinbach, D. Hand, D. Steinberg, Top 10
[21] A. Alimadadi, S. Aryal, I. Manandhar, P.B. Munroe, B. Joe, X. Cheng, Artificial algorithms in data mining, Knowl. Inf. Syst. 14 (2007) 1–37.
intelligence and machine learning to fight covid-19, Physiol. Genom. 52 (4) (2020) [44] J. Gou, W. Qiu, Z. Yi, X. Shen, Y. Zhan, W. Ou, Locality constrained
200–202, pMID: 32216577. arXiv:https://doi.org/10.1152/physiolgenomics. representation-based k-nearest neighbor classification, Knowl.-Based Syst.
00029.2020, doi: 10.1152/physiolgenomics.00029.2020. doi: 10.1152/ 167 (2019) 38–52, https://doi.org/10.1016/j.knosys.2019.01.016. http://
physiolgenomics.00029.2020.. www.sciencedirect.com/science/article/pii/S0950705119300152.
[22] J. Chen, K. Li, Z. Zhang, K. Li, P.S. Yu, A Survey on Applications of Artificial [45] J. Gou, H. Ma, W. Ou, S. Zeng, Y. Rao, H. Yang, A generalized mean distance-
Intelligence in Fighting Against COVID-19, arXiv e-prints (2020) based k-nearest neighbor classifier, Expert Syst. Appl. 115 (2019) 356–372,
arXiv:2007.02202arXiv:2007.02202.. https://doi.org/10.1016/j.eswa.2018.08.021. http://
[23] G.S. Randhawa, M.P.M. Soltysiak, H. El Roz, C.P.E. de Souza, K.A. Hill, L. Kari, www.sciencedirect.com/science/article/pii/S0957417418305293.
Machine learning using intrinsic genomic signatures for rapid classification of [46] X. Xia, Extreme genomic CpG deficiency in SARS-CoV-2 and evasion of host
novel pathogens: Covid-19 case study, vol. 15, Public Library of Science, 2020, antiviral defense, Mol. Biol. Evol. 37 (9) (2020) 2699–2705. arXiv:https://
pp. 1–24. doi: 10.1371/journal.pone.0232391. URL https://doi.org/10.1371/ academic.oup.com/mbe/article-pdf/37/9/2699/33721082/msaa094.pdf, doi:
journal.pone.0232391.. 10.1093/molbev/msaa094. doi: 10.1093/molbev/msaa094..
[24] S.M. Naeem, M.S. Mabrouk, S.Y. Marzouk, M.A. Eldosoky, A diagnostic genomic [47] Y. Wang, J.M. Mao, G.D. Wang, Z.P. Luo, L. Yang, Q. Yao, K.P. Chen, Human sars-
signal processing (GSP)-based system for automatic feature analysis and cov-2 has evolved to reduce cg dinucleotide in its open reading frames, Sci.
detection of COVID-19, Brief. Bioinf. Bbaa170. arXiv:https://academic. Rep. 10 (2020) 5165–5184.
oup.com/bib/advance-article-pdf/doi/10.1093/bib/bbaa170/33650674/ [48] H. Dinka, A. Milkesa, Unfolding sars-cov-2 viral genome to understand its gene
bbaa170.pdf, doi: 10.1093/bib/bbaa170. URL https://doi.org/10.1093/bib/ expression regulation, Infect. Genet. Evol. 84. doi: 10.1016/j.
bbaa170.. meegid.2020.104386..
[25] A.F.d.M. Batista, J.L. Miraglia, T.H.R. Donato, A.D.P. Chiavegatto Filho, Covid-19 [49] The 2019 novel coronavirus resource, https://bigd.big.ac.cn/ncov, accessed:
diagnosis prediction in emergency care patients: a machine learning approach, 2020-09-24..
847