You are on page 1of 10

Detection of Malaria Disease Using Image

Processing and Machine Learning

Md. Maruf Hasan1[0000−0002−4408−3472] ( ) , Sabiha


Islam2[0000−0001−6322−857X] , Ashim Dey3[0000−0001−7319−8917] ( ) ,
Annesha Das4[0000−0001−7182−9316] , and Sharmistha Chanda
Tista5[0000−0002−2520−8030]

Computer Science & Engineering, Chittagong University of Engineering and


Technology, Chittagong-4349, Bangladesh
1,2 {maruf.cuet.cse16,munshat.cuet16}@gmail.com,
3,4,5 {ashim,annesha,tista chanda}@cuet.ac.bd

Abstract. Malaria is a contagious disease that claims millions of lives


each year. A standard laboratory malaria diagnosis requires a careful
study of both healthy and infected red blood cells. Malaria can be di-
agnosed by looking at a drop of the patient’s blood under a microscope
and opening it on a slide as a blood smear. The quality of the blood
smear also influences its accuracy and correctness in the classification
and detection of malaria disease. This results in a large number of in-
evitable errors, which are not acceptable. The goal of this research is to
create a computer-aided method for the automatic detection of malaria
parasites using image processing and machine learning techniques. Un-
infected or parasitized blood cells have been classified using handcrafted
features extracted from red blood cell images. We have implemented
Adaboost, K-Nearest Neighbor, Decision Tree, Random Forest, Support
Vector Machine and Multinomial Naive Bayes machine learning models
on a dataset of 27,558 cell images. Among these algorithms, Adaboost,
Random Forest, Support Vector Machine, and Multinomial Naive Bayes
achieved an accuracy of about 91%. Furthermore, the ROC curve demon-
strates that the Random Forest classification model is the best. We hope
that by decreasing the requirement for human intervention throughout
the detection process, this approach can greatly improve the efficiency
of malaria disease detection.

Keywords: Malaria disease, Blood smear images, Image processing,


Machine learning, Computer-aided diagnosis.

1 Introduction
Malaria has become one of the severe infectious diseases for humankind. The bite
of Anopheles mosquitoes is the main reason for transmitting this disease. Accord-
ing to Wikipedia, out of 400 species, only 30 species of Anopheles mosquitoes
are malaria vectors. Nowadays, it is a serious public health issue around the
globe, particularly in third-world countries. As per WHO (World Health Orga-
nization), 1.5 billion malaria cases were averted since 2020, but 4,09,000 people
died of malaria in 2019 [1]. The transmission of the malaria virus depends on cli-
mate conditions. Especially during the rainy season, this disease spreads rapidly
because this is the breeding season for the Anopheles mosquitoes. It grows more
intense when temperature rises to the point that a mosquito’s life span can be ex-
tended. Regarding the temperature issue, in many tropical areas such as Latin
America, Asia, and also Africa, the malaria disease spreading rate is around
90%. According to WHO, in 2019, about 50% of the entire world’s population
was in danger of malaria. Malaria is the leading cause of death in Sub-Saharan
Africa. Western Pacific, the Eastern Mediterranean, the South-East Asia, and
the Americas have all been recognized as high-risk zones by the WHO [2]. Most
of the time, malaria can be predominant in remote areas where it is hard to
find proper medical treatment. It is critical to detect malaria disease early and
administer appropriate treatment; otherwise, the disease can be fatal.
Qualified microscopists examine blood smears of infected erythrocytes as one
typical method of detecting malaria. These are traditional diagnostic methods
used in laboratories by microscopists, such as clinical diagnosis. Microscopic di-
agnoses are the most widely used malaria diagnosis procedures, taking only 15
minutes to complete. But the efficiency and accuracy of these methods are de-
pended on the degree of human proficiency, which is challenging to find most of
the time. Otherwise, accuracy fluctuates. Polymerase Chain Reaction (PCR) is
the most sensitive and specific approach to recognize malaria parasites and is
more typical for species identification [3]. Microscopists use an alternative PCR
method for malaria diagnosis that allows sensitive and specific detection of Plas-
modium species DNA from peripheral blood. Rapid Diagnostic Test (RDT),
which is also a microscopic diagnosis method that provides high-quality mi-
croscopy services in distant locations with limited access for reliable detection
of malaria infections [4]. This method is unsuccessful in some cases because
effective results depend on the experience and knowledge of microscopists, and
also, human error is inevitable. If there were more efficient automated diagnostic
methods available for malaria detection, then this disease could easily be con-
trolled. Recently, There are many automated machine learning or deep learning
approaches have come across to detect this disease, which are claimed to be more
efficient than conventional approaches [5–9].
In this work, we have used machine learning algorithms with automatic image
recognition technologies for detecting parasite-infected red blood cells on stan-
dard microscope slide images. We have used the image smoothing technique,
gray scale conversion and feature extraction. The main objectives of our work
are:

– To locate region of interest and extract key features from standard micro-
scopic images of red blood cells using image processing techniques.
– To train various machine learning models using the extracted features for
classifying healthy and parasitized red blood cells.
– To find the most suitable approach based on different evaluation metrics for
detecting malaria disease.
The rest of the paper is arranged as follows: Section 2 presents related works we
have investigated. Our methodology is illustrated in Section 3. Section 4 exhibits
the obtained results in details. In the end, Section 5 concludes the paper.

2 Related Work
Nowadays, malaria has become a fatal life-threatening disease, causing deep re-
search interest among scientists all over the world. Different techniques, methods,
and algorithms have been used to detect parasitic blood cells in recent times.
In the domain of machine learning, mostly the handcrafted features are used
for decision making. Previously, the feature extraction was dependent on mor-
phological factors [10] and the classification was analyzed by Support Vector
Machine (SVM) and Principle Component Analysis (PCA).
In disease recognition studies, Convolutional Neural Networks (CNN) gained
stimulating results in terms of efficiency and accuracy [5]. In the advanced
method, it is found that CNN is much more effective than the SVM classi-
fier method for the purpose of image featuring [6]. In [7], to extract features
of the optimal layer of a pretrained model, the 16-layered CNN model got a
detection accuracy of 97.37% which is claimed to be superior to other transfer
learning model with an accuracy of 91.99%. The CNN model was also explored
for extracting features from the 96 x 96 resolution cell image data in [8]. Among
the CNN architectures, the GoogleNet, ResNet, and VGGNet models showed an
accuracy rate in the range of 90% to 96%. They used Contrast Limited Adaptive
Histogram Equalization (CLAHE) for pre-processing the images to enhance the
quality. In [9], they have introduced the Multi-Magnification Deep Residual Net-
work, an enhanced deep learning approach for the categorization of microscopic
blood smear photos. They have handled the problem of vanishing gradients,
degradation, low-quality images by combining batch normalization and individ-
ual residual units.
There are multiple image pre-processing techniques for instance, image en-
hancement and feature extraction that can be used. In [11], images were con-
verted into grayscale and then Gray Level Co-occurrence Matrix (GLCM). His-
togram of Oriented Gradients (HOG), Local Binary Pattern (LBP) was being
applied for feature extraction. By using these pre-processing methods, different
machine learning algorithms had the highest accuracy of 97.93% with the use of
the Support Vector classification model. In [12], they have used different machine
learning algorithms such as Cubic SVM, Linear SVM, and Cosine KNN, but Cu-
bic SVM got the highest accuracy of 86.1% among them. They have tested only
110 thin films for their system.
To choose a suitable and highly precise model for detecting the malaria par-
asite from a microscopic blood smear, autoencoder training from deep learning
showed an accuracy of 99.23% with nearly 4600 flops of image [2]. Precisely this
model with 28 x 28 images gave an accuracy of 99.51% whereas 32 x 32 images
gave an accuracy of 99.22%. They compromised too little accuracy, only 0.0029,
to obtain a slightly higher image resolution quality for sensitive, specific, and
precise performance on a smartphone, as well as a low-cost phone and web ap-
plication for portable malaria diagnosis.

3 Methodology
First, we have identified a series of steps and designed a methodology to achieve
our goal. Our overall methodology is represented in Fig. 1. A publicly available
dataset was used in this work. The techniques for obtaining data, preprocessing
and model training are covered in the following subsections.

Fig. 1: Block diagram of our methodology.

3.1 Dataset Description


The first step is to collect images of blood smears from malaria patients. We col-
lected the dataset from Kaggle which is publicly available [13]. This dataset has
27,558 blood cell images which are divided into two classes: cells infected with
malaria, which have 13779 data, and cells that are not infected with malaria,
which also have 13779 data. The original source of this dataset is Chittagong
Medical College Hospital, Bangladesh [14]. Thin blood sample slides were col-
lected by photographing 150 P. falciparum-infected, which is commonly known
as malaria-infected, and 50 healthy patients. Fig. 2 shows some sample data from
the dataset.

(a) (b)

Fig. 2: Sample images (a) Uninfected and (b) Parasitized.

3.2 Data Preprocessing


Transforming raw data before applying a machine learning algorithm is called
preprocessing. Preprocessing data is an important phase in ML as the quality of
data as well as functional details can be retrieved from it, which greatly affects
the performance and correctness of a model. Image preprocessing begins with
the input of an image and then performs some operations on that image, such
as image sharpening, image filtering, and image enhancement. Initially, we have
used the original images as an input, as shown in Fig. 3 and resized them to
120x120. Images can be smoothened by different blurring techniques such as
Averaging, Gaussian Blurring, Median Blurring, Bilateral Filtering provided by
OpenCV. Blurring techniques are beneficial in removing noise from images. Here,
smoothing is accomplished using the Gaussian blur technique, as illustrated in
Fig. 3. We have used Gaussian blurring, which is a very effective tool for removing
Gaussian noise from images. We have used OpenCV and Python to convert
images into grayscale images after smoothing them. We are more interested in
the patterns of these images because there isn’t much information in color as a
whole.

3.3 Feature Extraction


In this step, we have identified our region of interest from the preprocessed
images. To locate the infected areas in these images, we have attempted to detect
all contours. Simply, Contours are a curve which connects all continuous points
(along the boundary) that have the similar intensity or color. Contours are an
Fig. 3: Data processing steps.

effective tool for analysing shape as well as for object detection and recognition.
For our work, features are extracted in this step by obtaining the five largest
contour areas or bounded regions. When we have got higher accuracy for the
five largest contour areas, but when we have considered less than the five largest
areas, the accuracy is reduced. By considering more than five of the largest areas,
accuracy remains the same. For uninfected images, we have got 1 contour area
in 12544 images out of 13779 images, and only 273 images have 5 contour areas.
For parasitized images, out of 13779 images, only 1585 images have 1 contour
area and 1585 images have 5 contour areas.
3.4 Model Training

To detect uninfected and parasitized blood smears, six classifiers have been se-
lected for training. They are AdaBoost (AD), K-Nearest Neighbor (KNN), Sup-
port Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and
Multinomial Naive Bayes (MNB).
To train and evaluate our model, we have used 70% of the images for training
and 30% for testing. To find out which model is better for detecting malaria
disease, the suggested technique’s performance is evaluated using graphical and
statistical indicators, including the confusion matrix, accuracy, F1-score, recall,
precision, and ROC curve. The confusion matrix generates an array containing
the number of true positives (TP), false positives (FP), false negatives (FN),
and true negatives (TN).

– Accuracy: The accuracy estimates the ratio of expected to actual val-


ues,regardless of whether the sample is positive or negative, as illustrated
in the given Formula.

TP + TN
Accrcy = (1)
TP + TN + FP + FN
– Precision: Precision is defined as the ratio of all positive samples that are
actually positive, as illustrated in the given Formula.

TP
Precson = (2)
TP + FP
– Recall: The recall is defined as the ratio of positive predictions to all positive
predictions, as illustrated in the given Formula.

TP
Rec = (3)
TP + FN
– F1 score: The F1 metric is used to describe the classification performance
of the system.As illustrated in the given Formula, it is calculated using the
recall and precision rates.

2 ∗ Precson ∗ Rec 2 ∗ TP
F1 = = (4)
Precson + Rec 2 ∗ TP + FP + FN

4 Result Analysis
After following the steps mentioned earlier in preprocessing and feature extrac-
tion, the classifiers are trained using the Scikit-learn library. The performance
of these classifiers is compared as shown in Table 1. The overall classification
performance varies between 84% and 91%. According to the classification report
of Table 1, the performance of the SVM, AD, RF, and MNB is slightly better in
Fig. 4: Confusion matrices.

terms of test accuracy and classification report. These classifiers achieved an av-
erage accuracy of 90.63%. Fig.4 shows the confusion matrices of the implemented
classifiers. We can see that SVM can predict 3733 images correctly as parasitized
and 3703 images correctly as uninfected, AD can predict 3700 images correctly
as parasitized and 3734 images correctly as uninfected, RF can predict 3735 im-
ages correctly and 3694 images correctly as uninfected, MNB can predict 3692
images correctly as parasitized and 3713 images correctly as uninfected. Then
we explored the stacking ensemble technique by combining the best performing
models, but in Table 1 we can see that the test accuracy is 90.71%, which is
lower than the test accuracy of the RF classifier.
To select the best model among the four models with the same accuracy, we
have further investigated the AUC-ROC curve as shown in Fig. 5. The objective
of the AUC-ROC curve is to present the model’s overall detection rate. The
horizontal line in the diagram indicates the model’s false-positive rate, while the
vertical line indicates the model’s true-positive rate. We can conclude that the
Fig. 5: ROC curve.

Table 1: Classification report in weighted average.


Model Accuracy Precision Recall F1-Score
DT 83.62 83.63 83.63 83.62
AD 90.59 90.57 90.64 90.58
KNN 88.05 88.04 88.08 88.04
RF 90.76 90.75 90.77 90.76
MNB 90.54 90.53 90.58 90.54
SVM 90.64 90.63 90.65 90.64
Ensemble
90.71 90.70 90.73 90.71
(AD+RF+SVM+MNB)

performance of the Random Forest Classifier is noticeably superior in terms of


AUC as measured by the ROC curve.

5 Conclusion

Malaria is a contagious mosquito-borne disease and diagnosis of this disease


requires thorough and careful examination of red blood smears. This diagnosis
procedure is not only time-consuming but also its accuracy relies on the expertise
of pathologists. Now-a-days, machine learning has become a popular strategy for
handling the most complicated real-world issues. In this work, we have utilized
machine learning along with image processing for reliable diagnosis of malaria
disease. First, handcrafted features were extracted by identifying region of in-
terest from a dataset of 27,558 microscopic images. For this purpose, five largest
contours have been considered from the preprocessed images. Then, six machine
learning models along with an ensemble model were trained using the extracted
features. We successfully identified the results of parasitized and healthy non-
parasitized photos of blood smears with the highest accuracy of about 91%. In
future, we aim to incorporate deep learning approaches in this work for more
accurate analysis and classification of red blood smear images.
References
1. Who, “fact sheet: World malaria report 2020,” in world health organiza-
tion, world health organisation, 2020. https://www.who.int/teams/global-malaria-
programme/reports/world-malaria-report-2020, (Accessed on 10/23/2021)
2. Fuhad, K.M.F., Tuba, J.F., Sarker, M.R.A., Momen, S., Mohammed, N.,
Rahman, T.: Deep learning based automatic malaria parasite detection from
blood smear and its smartphone based application. Diagnostics 10(5) (2020),
https://www.mdpi.com/2075-4418/10/5/329
3. Hänscheid, T., Grobusch, M.P.: How useful is pcr in the diagnosis of malaria?
Trends in parasitology 18(9), 395–398 (2002)
4. Wongsrichanalai, C., Barcus, M., Sinuon, M., Sutamihardja, A., Wernsdorfer, W.:
A review of malaria diagnostic tools: Microscopy and rapid diagnostic test (rdt).
The American journal of tropical medicine and hygiene 77, 119–27 (01 2008)
5. Khan, S., Islam, N., Jan, Z., Ud Din, I., Rodrigues, J.J.P.C.: A novel deep
learning based framework for the detection and classification of breast can-
cer using transfer learning. Pattern Recognition Letters 125, 1–6 (2019),
https://www.sciencedirect.com/science/article/pii/S0167865519301059
6. Lecun, Y., Bengio, Y.: Convolutional networks for images, speech, and time-series
(01 1995)
7. Liang, Z., Powell, A., Ersoy, I., Poostchi, M., Silamut, K., Palaniappan, K., Guo, P.,
Hossain, M.A., Sameer, A., Maude, R.J., Huang, J.X., Jaeger, S., Thoma, G.: Cnn-
based image analysis for malaria diagnosis. In: 2016 IEEE International Conference
on Bioinformatics and Biomedicine (BIBM). pp. 493–496 (2016)
8. Militante, S.V.: Malaria disease recognition through adaptive deep learning models
of convolutional neural network. In: 2019 IEEE 6th International Conference on
Engineering Technologies and Applied Sciences (ICETAS). pp. 1–6 (2019)
9. Pattanaik, P., Mittal, M., Khan, M.Z., Panda, S.: Malaria detec-
tion using deep residual networks with mobile microscopy. Journal of
King Saud University - Computer and Information Sciences (2020),
https://www.sciencedirect.com/science/article/pii/S1319157820304171
10. Linder, N., Turkki, R., Walliander, M., Mårtensson, A., Diwan, V., Rahtu,
E., Pietikäinen, M., Lundin, M., Lundin, J.: A malaria diagnostic tool based
on computer vision screening and visualization of plasmodium falciparum can-
didate areas in digitized blood smears. PLOS ONE 9(8), 1–12 (08 2014),
https://doi.org/10.1371/journal.pone.0104855
11. Kumari, U., Memon, M., Narejo, S., Afzal, M.: Malaria
disease detection using machine learning (01 2021),
https://www.researchgate.net/publication/348408910 Malaria Disease Detection
Using Machine Learning
12. Kumari, U., Memon, M., Narejo, S., Afzal, M.: Malaria detection using image
processing and machine learning. IJERT NTASU–2020 (Volume 09–Issue 03 (2021)
13. Arunava: Malaria cell images dataset. https://www.kaggle.com/iarunava/cell-
images-for-detecting-malaria , (Last accessed 23, 10, 2021)
14. Rajaraman, S., Antani, S., Poostchi, M., Silamut, K., Hossain, M.A., Maude, R.J.,
Jaeger, S., Thoma, G.R.: Pre-trained convolutional neural networks as feature
extractors toward improved malaria parasite detection in thin blood smear images.
PeerJ 6 (2018)

You might also like