Professional Documents
Culture Documents
1 Introduction
Malaria has become one of the severe infectious diseases for humankind. The bite
of Anopheles mosquitoes is the main reason for transmitting this disease. Accord-
ing to Wikipedia, out of 400 species, only 30 species of Anopheles mosquitoes
are malaria vectors. Nowadays, it is a serious public health issue around the
globe, particularly in third-world countries. As per WHO (World Health Orga-
nization), 1.5 billion malaria cases were averted since 2020, but 4,09,000 people
died of malaria in 2019 [1]. The transmission of the malaria virus depends on cli-
mate conditions. Especially during the rainy season, this disease spreads rapidly
because this is the breeding season for the Anopheles mosquitoes. It grows more
intense when temperature rises to the point that a mosquito’s life span can be ex-
tended. Regarding the temperature issue, in many tropical areas such as Latin
America, Asia, and also Africa, the malaria disease spreading rate is around
90%. According to WHO, in 2019, about 50% of the entire world’s population
was in danger of malaria. Malaria is the leading cause of death in Sub-Saharan
Africa. Western Pacific, the Eastern Mediterranean, the South-East Asia, and
the Americas have all been recognized as high-risk zones by the WHO [2]. Most
of the time, malaria can be predominant in remote areas where it is hard to
find proper medical treatment. It is critical to detect malaria disease early and
administer appropriate treatment; otherwise, the disease can be fatal.
Qualified microscopists examine blood smears of infected erythrocytes as one
typical method of detecting malaria. These are traditional diagnostic methods
used in laboratories by microscopists, such as clinical diagnosis. Microscopic di-
agnoses are the most widely used malaria diagnosis procedures, taking only 15
minutes to complete. But the efficiency and accuracy of these methods are de-
pended on the degree of human proficiency, which is challenging to find most of
the time. Otherwise, accuracy fluctuates. Polymerase Chain Reaction (PCR) is
the most sensitive and specific approach to recognize malaria parasites and is
more typical for species identification [3]. Microscopists use an alternative PCR
method for malaria diagnosis that allows sensitive and specific detection of Plas-
modium species DNA from peripheral blood. Rapid Diagnostic Test (RDT),
which is also a microscopic diagnosis method that provides high-quality mi-
croscopy services in distant locations with limited access for reliable detection
of malaria infections [4]. This method is unsuccessful in some cases because
effective results depend on the experience and knowledge of microscopists, and
also, human error is inevitable. If there were more efficient automated diagnostic
methods available for malaria detection, then this disease could easily be con-
trolled. Recently, There are many automated machine learning or deep learning
approaches have come across to detect this disease, which are claimed to be more
efficient than conventional approaches [5–9].
In this work, we have used machine learning algorithms with automatic image
recognition technologies for detecting parasite-infected red blood cells on stan-
dard microscope slide images. We have used the image smoothing technique,
gray scale conversion and feature extraction. The main objectives of our work
are:
– To locate region of interest and extract key features from standard micro-
scopic images of red blood cells using image processing techniques.
– To train various machine learning models using the extracted features for
classifying healthy and parasitized red blood cells.
– To find the most suitable approach based on different evaluation metrics for
detecting malaria disease.
The rest of the paper is arranged as follows: Section 2 presents related works we
have investigated. Our methodology is illustrated in Section 3. Section 4 exhibits
the obtained results in details. In the end, Section 5 concludes the paper.
2 Related Work
Nowadays, malaria has become a fatal life-threatening disease, causing deep re-
search interest among scientists all over the world. Different techniques, methods,
and algorithms have been used to detect parasitic blood cells in recent times.
In the domain of machine learning, mostly the handcrafted features are used
for decision making. Previously, the feature extraction was dependent on mor-
phological factors [10] and the classification was analyzed by Support Vector
Machine (SVM) and Principle Component Analysis (PCA).
In disease recognition studies, Convolutional Neural Networks (CNN) gained
stimulating results in terms of efficiency and accuracy [5]. In the advanced
method, it is found that CNN is much more effective than the SVM classi-
fier method for the purpose of image featuring [6]. In [7], to extract features
of the optimal layer of a pretrained model, the 16-layered CNN model got a
detection accuracy of 97.37% which is claimed to be superior to other transfer
learning model with an accuracy of 91.99%. The CNN model was also explored
for extracting features from the 96 x 96 resolution cell image data in [8]. Among
the CNN architectures, the GoogleNet, ResNet, and VGGNet models showed an
accuracy rate in the range of 90% to 96%. They used Contrast Limited Adaptive
Histogram Equalization (CLAHE) for pre-processing the images to enhance the
quality. In [9], they have introduced the Multi-Magnification Deep Residual Net-
work, an enhanced deep learning approach for the categorization of microscopic
blood smear photos. They have handled the problem of vanishing gradients,
degradation, low-quality images by combining batch normalization and individ-
ual residual units.
There are multiple image pre-processing techniques for instance, image en-
hancement and feature extraction that can be used. In [11], images were con-
verted into grayscale and then Gray Level Co-occurrence Matrix (GLCM). His-
togram of Oriented Gradients (HOG), Local Binary Pattern (LBP) was being
applied for feature extraction. By using these pre-processing methods, different
machine learning algorithms had the highest accuracy of 97.93% with the use of
the Support Vector classification model. In [12], they have used different machine
learning algorithms such as Cubic SVM, Linear SVM, and Cosine KNN, but Cu-
bic SVM got the highest accuracy of 86.1% among them. They have tested only
110 thin films for their system.
To choose a suitable and highly precise model for detecting the malaria par-
asite from a microscopic blood smear, autoencoder training from deep learning
showed an accuracy of 99.23% with nearly 4600 flops of image [2]. Precisely this
model with 28 x 28 images gave an accuracy of 99.51% whereas 32 x 32 images
gave an accuracy of 99.22%. They compromised too little accuracy, only 0.0029,
to obtain a slightly higher image resolution quality for sensitive, specific, and
precise performance on a smartphone, as well as a low-cost phone and web ap-
plication for portable malaria diagnosis.
3 Methodology
First, we have identified a series of steps and designed a methodology to achieve
our goal. Our overall methodology is represented in Fig. 1. A publicly available
dataset was used in this work. The techniques for obtaining data, preprocessing
and model training are covered in the following subsections.
(a) (b)
effective tool for analysing shape as well as for object detection and recognition.
For our work, features are extracted in this step by obtaining the five largest
contour areas or bounded regions. When we have got higher accuracy for the
five largest contour areas, but when we have considered less than the five largest
areas, the accuracy is reduced. By considering more than five of the largest areas,
accuracy remains the same. For uninfected images, we have got 1 contour area
in 12544 images out of 13779 images, and only 273 images have 5 contour areas.
For parasitized images, out of 13779 images, only 1585 images have 1 contour
area and 1585 images have 5 contour areas.
3.4 Model Training
To detect uninfected and parasitized blood smears, six classifiers have been se-
lected for training. They are AdaBoost (AD), K-Nearest Neighbor (KNN), Sup-
port Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and
Multinomial Naive Bayes (MNB).
To train and evaluate our model, we have used 70% of the images for training
and 30% for testing. To find out which model is better for detecting malaria
disease, the suggested technique’s performance is evaluated using graphical and
statistical indicators, including the confusion matrix, accuracy, F1-score, recall,
precision, and ROC curve. The confusion matrix generates an array containing
the number of true positives (TP), false positives (FP), false negatives (FN),
and true negatives (TN).
TP + TN
Accrcy = (1)
TP + TN + FP + FN
– Precision: Precision is defined as the ratio of all positive samples that are
actually positive, as illustrated in the given Formula.
TP
Precson = (2)
TP + FP
– Recall: The recall is defined as the ratio of positive predictions to all positive
predictions, as illustrated in the given Formula.
TP
Rec = (3)
TP + FN
– F1 score: The F1 metric is used to describe the classification performance
of the system.As illustrated in the given Formula, it is calculated using the
recall and precision rates.
2 ∗ Precson ∗ Rec 2 ∗ TP
F1 = = (4)
Precson + Rec 2 ∗ TP + FP + FN
4 Result Analysis
After following the steps mentioned earlier in preprocessing and feature extrac-
tion, the classifiers are trained using the Scikit-learn library. The performance
of these classifiers is compared as shown in Table 1. The overall classification
performance varies between 84% and 91%. According to the classification report
of Table 1, the performance of the SVM, AD, RF, and MNB is slightly better in
Fig. 4: Confusion matrices.
terms of test accuracy and classification report. These classifiers achieved an av-
erage accuracy of 90.63%. Fig.4 shows the confusion matrices of the implemented
classifiers. We can see that SVM can predict 3733 images correctly as parasitized
and 3703 images correctly as uninfected, AD can predict 3700 images correctly
as parasitized and 3734 images correctly as uninfected, RF can predict 3735 im-
ages correctly and 3694 images correctly as uninfected, MNB can predict 3692
images correctly as parasitized and 3713 images correctly as uninfected. Then
we explored the stacking ensemble technique by combining the best performing
models, but in Table 1 we can see that the test accuracy is 90.71%, which is
lower than the test accuracy of the RF classifier.
To select the best model among the four models with the same accuracy, we
have further investigated the AUC-ROC curve as shown in Fig. 5. The objective
of the AUC-ROC curve is to present the model’s overall detection rate. The
horizontal line in the diagram indicates the model’s false-positive rate, while the
vertical line indicates the model’s true-positive rate. We can conclude that the
Fig. 5: ROC curve.
5 Conclusion