You are on page 1of 10

Computational Biology and Chemistry 88 (2020) 107329

Contents lists available at ScienceDirect

Computational Biology and Chemistry


journal homepage: www.elsevier.com/locate/cbac

A deep learning approach based on convolutional LSTM for detecting T


diabetes
Motiur Rahman*, Dilshad Islam, Rokeya Jahan Mukti, Indrajit Saha
Department of Physical and Mathematical Sciences, Chattogram Veterinary and Animal Sciences University, Chattogram, Bangladesh

A R T I C LE I N FO A B S T R A C T

Keywords: Diabetes is a chronic disease that occurs when the pancreas does not generate sufficient insulin or the body
Diabetes detection cannot effectively utilize the produced insulin. If it remains unidentified and untreated, then it could be very
Feature selection deadliest. One can lead a healthy life with proper treatment if the presence of diabetes can be detected at an
Parameter optimization early stage. When the conventional process of detecting diabetes is tedious, there is a need of an automated
LSTM
system for identifying diabetes from the clinical and physical data. In this study, we developed a novel diabetes
Conv-LSTM
CNN
classifying model based on Convolutional Long Short-term Memory (Conv-LSTM) that was not applied yet in this
regard. We applied another three popular models such as Convolutional Neural Network (CNN), Traditional
LSTM (T-LSTM), and CNN-LSTM and compared the performance with our developed model over the Pima
Indians Diabetes Database (PIDD). Significant features were extracted from the dataset using Boruta algorithm
that returned glucose, BMI, insulin, blood pressure, and age as important features for classifying diabetes pa-
tients more accurately. We performed hyperparameter optimization using Grid Search algorithm in order to find
the optimal parameters for the applied models. Initial experiment by splitting the dataset into separate training
and testing sets, the Conv-LSTM-based model classified the diabetes patients with the highest accuracy of 91.38
%. In later, using cross-validation technique the Conv-LSTM model achieved the highest accuracy of 97.26 % and
outperformed the other three models along with the state-of-the-art models.

1. Introduction Despite a huge expenditure of USD 760 billion dollars in diabetes, one
in every two people with diabetes remain undiagnosed (Mhaskar et al.,
Diabetes is one of the common deadliest and chronic diseases that 2017). Additionally, 374 million people are at escalating risk of af-
causes many complications if it remains untreated and unidentified. fecting diabetes. Development of living standards is mostly responsible
Higher concentration of glucose than the normal level is presented in for this unwanted increase of diabetes patients.
blood due to diabetes. If it remains unidentified and untreated, then it A diabetes patient can lead a normal life by proper diagnosis and
can lead to serious damages in eyes, kidneys, heart, blood vessels, and treatment. There have different measures like A1c, random blood sugar,
nerves (Sneha and Gangil, 2019). Two types of diabetes are found fasting blood sugar, and oral glucose tolerance test, etc for identifying
among the patients- type-1 and type-2 diabetes (Allam et al., 2011). diabetes (Ramachandran, 2020). Depending on the type of diabetes,
Normally the young people, mostly less than 30 years old, are affected patients may check and record blood glucose concentration as many
by type-1 diabetes. Patients with type-1 diabetes cannot be cured ef- four times a day or more if they are taking insulin. Diabetes identifi-
fectively with oral medications only while it requires insulin therapy to cation based on a single parameter may lead to misdiagnosis and mis-
treat it properly (Li et al., 2018). In contrast, type-2 diabetes occurs lead decision making. High levels of vitamin E ingestion can falsely
commonly in middle-aged and older-aged people, which is not com- increase A1c levels while vitamin B-9 and B-12 can falsely lower A1c
pletely curable. Type-2 diabetes affected people can lead a normal life levels (Guo et al., 2010). Ingestion of vitamin C may decline A1c when
by controlling lifestyle and regular checkups (Massaro et al., 2019). estimated by chromatography but levels may elevate when estimated
The International Diabetes Federation conducted a study in 2019 by electrophoresis (Clark et al., 2007). Hence, there is a need to com-
that unveiled approximately 463 million adults are living with diabetes bine different parameters to diagnose diabetes effectively. Diagnosis
and also projected that it will rise to 700 million by 2045. In addition, with the combination of glucose, BMI, diabetes pedigree function,
one of every 6 six live births are affected by diabetes during pregnancy. blood pressure, age, pregnancy, skin thickness, and insulin will be more


Corresponding author.
E-mail address: motiur@cvasu.ac.bd (M. Rahman).

https://doi.org/10.1016/j.compbiolchem.2020.107329
Received 25 April 2020; Received in revised form 29 May 2020; Accepted 2 July 2020
Available online 10 July 2020
1476-9271/ © 2020 Elsevier Ltd. All rights reserved.
M. Rahman, et al. Computational Biology and Chemistry 88 (2020) 107329

effective to identify and treat diabetes. patients by observing different attributes. A study was conducted using
A huge volume of clinical data is generated by each healthcare that the Pima Indians Diabetes dataset to predict diabetes at an early stage.
can be processed to retrieve useful information. The development of They used three machine-learning algorithms namely Decision Tree,
machine learning algorithms assists to process a huge volume of data SVM and Naive Bayes classifier for detecting diabetes, and they found
and extract the underlying data pattern that facilitates the decision- that the Naive Bayes classifier outperformed the other two classifiers
making process (Shi et al., 2015). Knowledge from data can be gathered with an accuracy of 76.30 % (Singh et al., 2018). The artificial neural
when a machine learning model can be trained with the observed data. network puts a significant contribution to diabetes prediction as it has
Several machine learning techniques have been developed for pre- strong predictive power. Quan et al. examined different algorithms
dicting disease and discovering knowledge from medical data. A re- namely decision tree, random forest and neural network to predict
current neural network unit called LSTM is widely used in different diabetes mellitus (Zou et al., 2018). They used data from a hospital in
prediction problems (Sun et al., 2018). The concept of memory in LSTM Luzhou, China for training and testing the models, and used five-fold
helps to capture the underlying data pattern and long dependency cross-validation technique for evaluating the models. Their experi-
among data during the training. The actual LSTM faces a problem called mental results indicated that the random forest algorithm achieved the
vanishing gradient that can be overcome by another architecture of highest accuracy of 80.80 %. Training and testing a machine learning
LSTM called Convolutional LSTM (Rahman and Adjeroh, 2019). An model with optimized features returns better accuracy rather than with
artificial neural network has a strong predictive power that can be used un-optimized features. Sneha et al. selected significant features from
to model complex prediction problems. A decision tree is a supervised Pima Indians Diabetes dataset using a modified approach, and that
machine learning algorithm that is widely used in data mining and features were used for early prediction of diabetes with SVM, random
various predictive problems. Random forest generates many trees to forest, naive Bayes, decision tree, and k-nearest neighbor classifiers
make a conclusion that also has a huge classification power. The pre- (Sneha and Gangil, 2019). They found the highest accuracy in terms of
diction accuracy of a machine learning technique highly depends on a diabetes prediction with the SVM classifier. The decision tree algorithm
better selection of attributes (Fortino et al., 2014). Feature selection is a was applied by Apoorva et al. to predict type-2 diabetes using the PID
process of shortening a large feature set to a subset that has more im- dataset. Performance comparison with SVM classifier justified that the
pact on the outcome. A better subset of features facilitates to fight decision tree forecasted type-2 diabetes accuratel (Apoorva et al.,
against the curse of dimensionality, to avoid overfitting, to increase 2020). Yukai et al. developed techniques for diabetes prediction from
generalizability, and to reduce the overall training times. Several fea- follow-up data using SVM, decision tree, and ensemble learning model,
ture selection algorithms are used to select the most relevant subset of and found that the decision tree classifier returned a lower accuracy
features. A feature ranking and selection technique named Boruta based than the other two (Li et al., 2018).
on a random forest algorithm is widely used for narrowing a large Another popular form of machine learning algorithm known as a
feature set to a subset (Ali et al., 2019). recurrent neural network (RNN) is widely used in various prediction
In this study, we employed Conv-LSTM, CNN, T-LSTM, and CNN- problems. A recurrent neural unit namely LSTM (long short-term
LSTM for predicting diabetes. The novelty of this work is that nobody memory) resolves the problem of long-term dependency among data
has applied Conv-LSTM yet for diabetes prediction although it has and performs well in diabetes prediction also. Qingnan et al. proposed a
outperformed in various classification and prediction problems. We deep neural network-based model using LSTM and Bi-LSTM for diabetes
utilized a random forest-based Boruta feature selection algorithm for prediction, and performance comparison with ARIMA and SVR classi-
capturing the most significant features. Different comparative analyses fiers showed that their proposed model outperformed the other two
were conducted to find out where our model performs better and vice- models (Sun et al., 2018). A deep learning model based on solid
versa. Performance comparison of our model with other models depicts mathematical theory and domain knowledge was suggested to predict
that Conv-LSTM-based model achieved 97.26 % accuracy that is higher blood glucose concentration from continuous glucose monitoring
than the other models. We have also tuned different parameters of our system data 30 min ahead. Another study using recurrent neural net-
model in order to find out the most suitable configuration when the work (RNN) and data obtained from CGM devices was conducted to
model is more accurate. We have performed hyperparameter optimi- predict type-1 diabetes. Performance comparison revealed that the RNN
zation using grid search algorithm. Besides, we have also reported based model performed better rather than the feed-forward neural
where our model performs better and the reasons behind. We have network prediction model (NNM). Swapna et al. developed a metho-
compared performance of our model with that of the state-of-the-art dology for the classification of diabetes and normal HRV signal by in-
models. corporating long short-term memory (LSTM), convolutional neural
The rest of the paper is organized as follows- the related literatures network (CNN) and their combinations (Swapna et al., 2018). They
are reviewed in Section 2; Section 3 outlines the used dataset; the ap- obtained performance improvement of 0.03 % and 0.06 % in CNN and
plied models to classify diabetes patients are depicted in Section 4; CNN-LSTM architecture respectively compared to their early work
Section 5 reviews the conducted experiments and architecture of the based on SVM.
model; results and discussion are narrated in Section 6; finally Section 7 The aforementioned machine learning algorithms like the random
draws the conclusion with some future recommendations. forest, SVM, decision tree, LSTM, ANN, CNN, RNN, ARIMA, and SVR
perform well in diabetes prediction. However, the random forest al-
2. Related research gorithm does well in classification problems while it has limitations in
regression problems as it doesn't predict beyond the range in the
Although a need of an automated diabetes classifying system is training data. The decision tree algorithm is unstable as a slight change
immense, a significant number of researches have not been carried out in data causes a larger change in the structure of the decision
on diabetes prediction using machine learning algorithms. Even though (Kotsiantis, 2013). Besides, the predictive power of the decision tree
the accuracy of the so far completed works is quite good, still there is a algorithm is inadequate for applying in regression problems. The sup-
scope of doing research to improve the accuracy of diabetes patient port vector machine (SVM) algorithm does not perform well when the
classification. Some recent works on diabetes prediction using machine dataset has more noise (Zhang, 2012). These three algorithms perform
learning algorithms are described in this section. well for classification problems while ANN, CNN, and LSTM algorithms
Recently, various machine learning algorithms have shown a pro- perform well for regression problems. Though the ANN, CNN, and
mising outcome in different medical research. Likewise, some machine LSTM do well in predictive problems, they face a problem called van-
learning algorithms such as Decision Tree, Random Forest SVM, and ishing gradient that makes it harder to capture the impact of the earliest
Naive Bayes are widely used for classifying and predicting diabetes stages when the length of input sequence is increased (Yamashita et al.,

2
M. Rahman, et al. Computational Biology and Chemistry 88 (2020) 107329

2018; Agatonovic-Kustrin and Beresford, 2000; Staudemeyer and Table 2


Morris, 2019). This problem has a negative impact on the output. This Hyperparameter optimization of Conv-LSTM and T-LSTM.
problem can be overcome by using an architecture of LSTM called Conv-LSTM
Convolutional LSTM that consists of a memory cell along with four
gates. The Conv-LSTM has an extra connection with the previous Learning rate Batch size Hidden units Epoch Mean test score
memory cells that allows it to capture the effect of the previous input in 0.01 32 100 300 98.24
0.01 64 100 300 97.01
the current timestamp (Shi et al., 2015). The memory cell allows it to 0.03 32 100 300 96.84
capture the impact of the earliest stages during the training phase T-LSTM
(Rahman and Adjeroh, 2019). Besides, an additional connection to the 0.01 32 100 300 91.31
previous memory cell called peephole connection prevents from oc- 0.03 32 100 300 89.64
0.01 64 100 300 89.17
curring vanishing gradient problems. In this paper, we developed a
model based on the Convolutional LSTM to predict diabetes using se-
lective features. Here, the significant features were selected using a
gradient problems etc. that have adverse effects on performance. These
random forest algorithm-based algorithm namely Boruta.
limitations can be overcome by applying a variation of LSTM named as
Convolutional LSTM (Conv-LSTM) that allows to capture the impact of
3. Dataset the earliest given data. The vanishing gradient problem can be elimi-
nated by tuning the weight value during the training phase. In this
The performance of our Conv-LSTM based model was evaluated by study, we employed Conv-LSTM, T-LSTM, CNN, and CNN-LSTM over
using a popular dataset namely Pima Indians Diabetes Database (PIDD) the Pima Indian Diabetes dataset to observe the performance of these
(Kamble et al., 2016) obtained from the UCI machine learning re- models in terms of accurate diabetes patient classification.
pository. The dataset contains records of 768 female patients aged at
least 21 years among them 268 are diabetes positive and the rest are 4.1. Traditional LSTM
diabetes negative. The dataset has eight predictor variables like Preg-
nancies, Glucose, Blood Pressure, BMI, Skin Thickness, Insulin, Diabetes A variant of recurrent neural network called Long Short-term
Pedigree Function, and Age for diagnostically predicting whether a Memory (LSTM) consists of a memory cell and four gates namely forget
patient has diabetes or not and one target variable named as outcome. gate f , input gate i , control gate c and output gate o (Muzaffar and
During the early stage of pregnancy, blood glucose level may in- Afshari, 2019). It can extract and remember the underlying data pattern
crease that leads of diabetes-related complications. Another important that resolves the long-term data dependency among data occurred in
indicator of diabetes is glucose if it presents more in blood than the classical RNN algorithms. The LSTM comes in various forms such as
normal level. High glucose level in blood can increase blood pressure Traditional LSTM (T-LSTM), Peephole LSTM, and Convolutional LSTM
that is a significant symptom of diabetes. Being overweight greatly (Conv-LSTM) that have different architectures (Zhu et al., 2019). The
increases the risk of developing type 2 diabetes. The imbalance of in- diagram of T-LSTM as shown in Fig. 1 (Rahman and Siddiqui, 2019)
sulin is also an important exhibitor of diabetes. Since diabetes may indicates that it takes previous cell state ht − 1, current input vector x t ,
come from family, diabetes pedigree function provides useful data on and bias b as inputs and generates current memory content ct and
diabetes. There is evident of getting skin thicken who are suffering from current cell state ht as final outputs after performing some operations
insulin-dependent diabetes mellitus. The risk of type 2 diabetes in- over the previous memory content ct − 1. These above mentioned four
creases as people get older, especially after age 45. Therefore, we can gates control the content of the memory cell.
say that all the mentioned features are important for detecting diabetes. The forget gate generates a value in the rage of 0–1 that decides how
In the dataset, a value 1 in the outcome column "tested positive for DM" much from the previous memory cell should be neglected. The value
and a value 0, tested negative for DM. A brief description of the dataset close to 0 indicates most of the memory content of the previous time-
is shown in Tables 1 and 2. stamp will be forgotten at the current timestamp and the opposite
happens for the value close to 1. The operation of forget gate is defined
4. Models as (Rahman and Siddiqui, 2019):
ft = σg (wf x t + uf ht − 1 + bf ) (1)
The aim of the study is to develop a diabetes prediction model using
Convolutional LSTM that has not been applied yet in diabetes patient The input gate determines how much input from the current time-
classifications. Recently, different machine learning techniques like stamp will be added to the memory cell and is defined as (Rahman and
traditional LSTM, convolutional neural network (CNN) and their com- Siddiqui, 2019):
binations are widely used for diabetes patient classification based on it = σg (wi x t + ui ht − 1 + bi ) (2)
some clinically obtained parameters. These techniques achieve a pro-
mising accuracy in diabetes prediction although they face some chal- The control gate manages the process of updating memory cell
lenges like capturing the effect of the earliest input data and vanishing content from ct − 1 to ct by considering the output of forget and input gate

Table 1
Description of Pima Indian Diabetes dataset.
Sr. no. Selected Attributes from PIMA Indian dataset Description of selected attributes Range

1. Pregnancy Number of times a participant is pregnant 0–17


2. Glucose Plasma glucose concentration a 2 h in an oral glucose tolerance test 0–199
3. Blood pressure It consists of Diastolic blood pressure (when blood exerts into arteries between heart)(mm Hg) 0–122
4. Skin Thickness Triceps skinfold thickness (mm).It concluded by the collagen content 0–99
5. Insulin 2-Hour serum insulin (mu U/mL) 0–846
6. BMI Body mass index (weight in kg/(height in m)^2) 0–67.1
7. Diabetes pedigree Function An appealing attributed used in diabetes prognosis 0.078–2.42
8. Age Age of participants 21–81
9. Outcome Diabetes class variable, Yes represent the patient is diabetic and no represent patient is not diabetic 1/0

3
M. Rahman, et al. Computational Biology and Chemistry 88 (2020) 107329

Fig. 1. Diagram of Traditional LSTM.

that is defined as (Rahman and Siddiqui, 2019): closed. This connection guarantees the impact of the earliest input even
when the input sequence is long. The four gates of the Conv-LSTM
ct = ft × ct − 1 + it × σh (wc x t + uc ht − 1 + bc ) (3)
works as follows (Rahman and Siddiqui, 2019):
The final output of the current timestamp is produced by the output
ft = σg (wf *x t + uf *ht − 1 + vf × ct − 1 + bf ) (6)
gate, which is also responsible for updating cell state ht − 1 to ht (Rahman
and Siddiqui, 2019). it = σg (wi *x t + ui *ht − 1 + vi × ct − 1 + bi ) (7)
ot = σg (wo x t + uo ht − 1 + bo) (4) ct = ft × ct − 1 + it × σh (wc *x t + uc *ht − 1 + bc ) (8)
ht = ot × σh (ct ) (5) ot = σg (wo*x t + uo*ht − 1 + vo × ct − 1 + bo) (9)
Here, σg and σh symbols denote sigmoid function and hyperbolic
ht = ot × σh (ct ) (10)
tangent function respectively; w and u are weight values used to avoid
vanishing gradients problem. We have applied T-LSTM to classify dia- In this study, we have developed a diabetes prediction model based
betes patients using the Pima Indian Diabetes database. In our case, the on Conv-LSTM that is applied over the Pima Indian Diabetes dataset.
inputs are given to an input layer that is passed through two hidden
layers and a dense layer acts as an output layer that performs the final 4.3. Convolutional neural network (CNN)
classifications. Each layer of the model contains 100 of T-LSTM units
that captures the data pattern when the given inputs passed through A deep learning algorithm named as Convolutional neural network
these units. An attention value for each input in each layer is estimated (CNN) is similar to a neural network where some inputs are given to
that defines the importance of the given input in performing the final each neuron. These neurons learn from data with the help of weight and
prediction. With the help of an attention vector, the dense layer per- bias by performing some operations like dot product (Larabi-Marie-
forms the final prediction whether a patient has diabetes or not. Sainte et al., 2019). Initially, CNNs were used in image classification
There has no connection between gates and the previous memory but now it performs well in some other classification problems also.
content as shown in Fig. 1 that impedes from getting the impact of To train and test a deep learning CNN model, the given inputs are
previous memory content when the output gate is closed. This incon- passed through a series of convolution layers with filters, pooling layer,
venience has adverse effects of classification and prediction perfor- fully connected (FC) layers and finally apply Softmax function to make
mance. The prime aim of this study is to apply Convolutional LSTM the final prediction. The convolution layer extracts the underlying data
(Conv-LSTM) for classifying diabetes patients and to point out how pattern by learning the relationship among the data. The pooling layer
Conv-LSTM overcomes the limitations faced by the T-LSTM including reduces the number of parameters based on the importance of that
CNN and CNN-LSTM. parameter. The fully connected layer produces the probability dis-
tribution over each class with the help of a Softmax function that de-
4.2. Convolutional LSTM fines the final prediction result. The details of CNN-based model for
diabetes prediction is shown in Fig. 3.
The earliest Long Short-term Memory (LSTM) known as T-LSTM The input data have been passed through a convolution layer that
cannot access the content of its previous memory cell when its output captures the data pattern by learning different features. Then, a feature
gate is closed (Shi et al., 2015). We can solve this problem by adding an map is generated by performing a dot product between given inputs and
extra connection between each gate and the previous memory content weight values. Here, a bias value is added in each step to remove the
that is known as a peephole connection. A variation of LSTM with vanishing gradient problems. The convolution layer uses an activation
peephole connection is known as Convolutional LSTM (Conv-LSTM) function named as ReLU (Rectified Linear Unit) that introduces non-
that allows all of the gates to utilize the previous memory cell content linearity in our convolutional network. Then the rectified feature maps
even when the output gate is closed. The diagram of Conv-LSTM is are given to the pooling layer that performs the down sampling. The
shown in Fig. 2 (Rahman and Siddiqui, 2019) that shows a similarity max pooling operation is applied here that takes the largest elements
with T-LSTM in terms of working principle. Here, each gate takes an from the rectified feature maps. The down sampled feature maps are
additional parameter (previous memory content ct-1) as input that similarly passed through a convolution layer and pooling layer that
ensures the impact of earlier memory cells even when the output gate is perform the same operation as the earlier convolution and pooling

4
M. Rahman, et al. Computational Biology and Chemistry 88 (2020) 107329

Fig. 2. Convolutional LSTM is derived from the T-LSTM by taking an extra connection from ct − 1 to each gate that is known as peephole connection.

layer. Then, the feature map is flattened into a vector and fed into a performance of a machine learning model. Very often, the modern
fully connected layer like a neural network. The fully connected layer dataset contains too many features where most of them are irrelevant to
combines all these vectors together to create a model and finally the decide the outcome and their relevancy is not known in advance (Ali
Softmax function classifies the patient into diabetes positive or not. et al., 2019). Presence of irrelevant features in a dataset has several
disadvantages- many machine learning algorithms display a fall of ac-
4.4. CNN-LSTM curacy, get slow down by taking too many resources and is simply in-
convenient to handle. Hence, selection of a small relevant feature set
The aforementioned CNN and LSTM both have strong predictive from a large one is desirable for getting best possible classification re-
power separately. Here, we have employed CNN in order to better sult. In this study, we have applied Boruta feature selection algorithm
understand the hidden features of the dataset and combined with LSTM. over the Pima Indian Diabetes database that returns glucose, BMI, in-
CNN-LSTM is the combination of CNN layers and LSTM layers that sulin, blood pressure, and age as important while pregnancy, diabetes
ensures both advantages of CNN and LSTM. Fig. 4 shows the archi- pedigree function, and skin thickness as unimportant features for clas-
tecture of model based on CNN-LSTM for diabetes patients’ classifica- sifying diabetes patients. Boruta is a feature selection algorithm that is
tions. In our application, we feed inputs into a convolution layer that based on the foundation of random forest classifier. It adds randomness
extracts features from the given inputs and adds non-linearity in our to the system and collects results from the group of randomized samples
convolutional network with the help of an activation function ReLU. that assists to reduce the misleading impact of random fluctuations and
Then, the generated rectified feature maps are passed through a pooling correlations. Therefore, the addition of extra randomness enables us to
layer that performs down sampling in order to decide the most acti- divide all attributes into important and unimportant feature set. The
vated features using max pooling operation. Then, the down sampled Boruta algorithm works as follows (Kursa and Rudnicki, 2010):
feature maps are passed through a tile of LSTM layers that handle the
sequence processing. Lastly, a fully connected layer represents the 1 Add randomness to the provided dataset by creating shuffled copies
feature vector and uses it for classification, regression with the help of of all features that are called shadow features.
Softmax function. 2 Run a random forest classifier on the extended dataset and measures
feature importance (Z score) to evaluate the importance of each
4.5. Boruta feature selection feature where higher means more important.
3 At every iteration, find the maximum Z score among the shadow
Feature selection plays a very important role in deciding the attributes and check whether a real feature has a higher Z score than

Fig. 3. Architecture of the developed CNN-based model for diabetes prediction.

5
M. Rahman, et al. Computational Biology and Chemistry 88 (2020) 107329

Fig. 4. Architecture of the developed model based on CNN-LSTM for diabetes prediction.

the maximum Z score of its shadow features. defines the ability of a model to accurately classify the true positive
4 Feature with lower Z score than the maximum Z score of its shadow samples while the specificity of a model is the capacity of identifying
features is deemed as unimportant and permanently remove them the true negative samples. The accuracy of a model defines the ratio of
from the information system. the number of patients correctly classified by the models. The formulas
5 Finally, the algorithm stops either when all features gets confirmed are as follows (Zou et al., 2018):
or rejected or it reaches a specified limit of random forest runs.
TP
Sensitivity (SN ) =
TP + FN (11)
Boruta is different from traditional feature selection algorithms as it
covers all features which are in some extent relevant to the outcome TN
Specificity (SP ) =
while the traditional feature selection algorithms rely on a small subset TN + FP (12)
of features that generates a minimal error on a chosen classifier.
TN + TP
Accuracy (ACC ) =
TN + TP + FP + FN (13)
4.6. Hyperparameter optimization
Where true positive (TP) defines the number of classified positive pa-
Performance of a machine learning algorithm highly depends on the tients those are actually positive. True negative (TP) represents the
optimum values of parameters. The optimum value of parameters are number of predicted negative patients who are actually negative. False
obtained by tuning with different values and observer performance positive (FP) is the number of classified positive patients who are ac-
where the algorithm returns better accuracy. Hyperparameter refers all tually negative. And false negative (FN) defines the number of identi-
the parameters of a model that are not updated during the learning fied negative patients who are actually positive. These parameters are
phase but are used to mold the model to achieve the best accuracy. often estimated to assess the classification quality of models.
Hyperparameter optimization also known as hyperparameter tuning is
the problem of selecting a set of optimal parameters that lowers the cost 5. Experiment and architecture
function of the model. We have performed hyperparameter tuning over
the applied four models using grid search approach. Grid search also Convolutional LSTM that has not been applied yet for classifying
called exhaustive search that looks through each combination of hy- diabetes patients. Hence, we have utilized this model to develop a novel
perparameters using permutation and combination. The performance of model for diabetes detection. The main concern before developing a
grid search algorithm is measured using cross-validation on the training machine learning model is to enhance its performance. The perfor-
set or evaluation on a held-out validation set. After performing all mance of a machine learning model highly depends on how much the
possible combinations of hyperparameters, the grid search algorithm selected features are relevant to predict the outcome correctly.
returns the settings that achieved the highest accuracy in the validation Narrowing down the redundant features or inclusion of relevant fea-
process along with the obtained accuracy. Hyperparameter optimiza- tures has a positive impact on accuracy of a model (Ramaswami and
tion of T-LSTM and Conv-LSTM has performed by keeping the value of Bhaskaran, 2009). Besides, models face an overfitting problem when
four parameters as follows: too many features are used to train that lessens the capacity of gen-
[learning rate = 0.01, 0.03, 0.05], [batch size = 16, 32,64, 128], eralizing patterns from new data (Karabulut et al., 2012). Additionally,
[hidden units = 50, 100, 150], [epoch = 200, 250, 300, 350] the computational complexity is also increased that requires more
Besides, the hyperparameter tuning of CNN and CNN-LSTM has memories and hardware. Hence, we have performed a feature selection
performed over the following settings: operation over the Pima Indian Diabetes dataset in order to extract the
[learning rate = 0.01, 0.03, 0.05], [batch size = 16, 32,64, 128], most relevant features that have a very positive impact on deciding the
[hidden units = 50, 100, 150], [epoch = 200, 250, 300, 350], [kernel diabetes patients. We have applied the Boruta feature selection algo-
size = 3*3, 5*5] rithm, which is an improved version of Random Forest importance
After performing hyperparameter optimization, grid search algo- measure algorithm (Degenhardt et al., 2019). Boruta feature selection
rithm returns optimal hyperparameters along with the highest accuracy algorithm takes multi-variable relationships into consideration for de-
of each model. We have performed grid search algorithm over the ciding all relevant features and it performs well for both regression and
model by importing GridSearchCV model from sklearn package in py- classifications problems (Fortino et al., 2014). Here, the applied Boruta
thon. algorithm returns Glucose, BMI, Blood Pressure, Insulin and Age as
relevant features for building an accurate and robust model in order to
4.7. Evaluation classify diabetes patients. Before performing the Boruta feature selec-
tion algorithm, we have cleaned data since some features like glucose,
The performance of the applied models was evaluated by measuring blood pressure, skin thickness, insulin, and BMI in our dataset have
sensitivity (SN), specificity (SP), and accuracy (ACC). The sensitivity some zero (0) values that are dealt with the median value of that

6
M. Rahman, et al. Computational Biology and Chemistry 88 (2020) 107329

Fig. 5. Architecture of the developed model based on Conv-LSTM for diabetes prediction.

attribute. The architecture of our model is shown in Fig. 5. 6. Results and discussion
We have incorporated three layers of Convolutional LSTM (Conv-
LSTM) to capture the underlying features of data and a fully connected The study was aimed at applying the Traditional LSTM (T-LSTM),
dense layer that represents the captured hidden data patterns. Initially, Convolutional Neural Network (CNN), CNN-LSTM, and Convolutional
the preprocessed data are passed through an Embedding layer that LSTM (Conv-LSTM) separately for building diabetes patients classifi-
transforms each given input into a distributed representation of length cation models. The performance of these four models were observed
100, which is then fed through three Con-LSTM layers. Each layer of over the Pima Indian Diabetes dataset. We have performed hy-
Conv-LSTM contains hundred hidden units and processes the received perparameter optimization using grid search algorithm to choose the
input in both directions in order to capture the underlying data features optimal hyperparameters that achieved the highest accuracy. The re-
effectively. The first layer of three Conv-LSTM layers acts as an input sults of grid search over the four models is shown in Tables 1–3. From
layer while the rest two layers are hidden layers. While the data are all possible combinations, the top 3 combinations that have achieved
passed through these three layers, a conditional probability known as the highest test score are shown in table.
attention is calculated for each input that determines the importance of Tables 1 and 2 shows that the both Conv-LSTM and T-LSTM have
that input in performing patient classification. The estimated attention obtained the highest test score of 98.24 and 91.31 percent respectively
for each input is stored in an attention vector that is used in the fully with the optimal hyperparameters like learning rate 0.01, batch size 32,
connected dense layer for doing accurate classification. Then, the cap- hidden units 100 and epoch 300. According to Table 3, the CNN-LSTM
tured feature vector is fed through the fully connected dense layer that and CNN models have achieved highest accuracy of 94.87 and 89.02
represents the data pattern and uses that pattern for final classification percent respectively with the same configurations of above two models
of diabetes patients. Here, a Softmax function is used that generates a by keeping kernel size 5*5. Hence, we have considered learning rate
probability distribution and this probability distribution defines whe- 0.01, batch size 32, hidden units 100, epoch 300 and kernel size 5*5 for
ther a patient is diabetes positive or not. each respective model to perform the final classification. Here, the
Initially, we have applied our model over the Pima Indian Diabetes learning rate defines how much of weight is adjusted in the model with
dataset by splitting the dataset into separate training and testing da- respect to the loss gradient; batch size is the number of sample inputs
taset. Later, we have observed the performance of our model by using passed through the model at a time; epoch defines how many times a
the five-fold cross validation technique over the same dataset. Our one machine learning model traverses through the same dataset.
objective is to find out the optimal parameter configuration where the We have applied these models over the mentioned dataset in two
model returns highest accuracy. Therefore, We have splitted the dataset separate ways such as keeping the training-testing set separately and
on three different points for building training and testing dataset such using five-fold cross validation technique to evaluate the performance.
as 80 % train and 20 % test, 75 % train and 25 % test, 70 % train and 30 Experimental results of both cases are shown in Table 4 that shows our
% test. Three different batch sizes such as 16, 32 and 64 are considered Conv-LSTM-based model has performed better in both cases compared
while three different learning rates like 0.01, 0.03 and 0.05 are taken to other three models. When the dataset has been splitted at point 0.2
into account for tuning the model. We have performed all experiments for training and testing data, the Conv-LSTM model has achieved the
for epoch 200, 250 and 300 separately to figure out the impact of the highest accuracy of 91.38 % that is greater than the other three models.
number of epochs on the model's performance.
Here, the size of the dataset is not so large. Therefore, there is a
chance of occurring overfitting problem that need to be dealt with. We Table 3
Hyperparameter optimization of CNN-LSTM and CNN.
have tuned the model by adding more layers and found that when the
hidden layer is more than two, the overfitting problem occurs. Hence, CNN-LSTM
we have kept two hidden layers in the model. Additionally, we have
Learning rate Mini Batch Hidden Epoch Kernel size Mean test
added dropout layers that randomly sets output features of a layer to
size units score
zero. The addition of dropout layers have exhibited that the model with
dropout layers starts overfitting at later than the model without dropout 0.01 32 100 300 5*5 94.87
layers. We have implemented all four models such as T-LSTM, CNN, 0.01 32 100 300 3*3 92.74
0.01 64 100 300 5*5 91.28
CNN-LSTM, and Conv-LSTM using a Python library called Keras which
CNN
is backed by TensorFlow. 0.01 32 100 300 5*5 89.02
0.01 64 100 300 3*3 87.19
0.03 32 100 300 3*3 85.98

7
M. Rahman, et al. Computational Biology and Chemistry 88 (2020) 107329

Table 4 capturing long data-dependency that is quite hard using the other
Diabetes prediction accuracy of different models with selected features. aforementioned models when the dataset is large.
Evaluation Models Accuracy Sensitivity (SN) Specificity (SP) We have performed several experiments to outline the combined
Techniques (ACC) effects of these parameters over classification accuracy that are shown
in Fig. 6. Figure shows that batch size and learning rate (Fig. 6(a)),
Separate T-LSTM 85.69 84.75 85.70
batch size and splitting point (Fig. 6(b)), batch size and epoch number
Training- CNN 82.14 83.04 82.34
Testing Set CNN-LSTM 88.47 87.78 89.47
(Fig. 6(c)) have significant combined impacts on the performance of the
Conv-LSTM 91.38 92.19 92.41 model. The combined effect of splitting point and epoch number
Five-fold Cross- T-LSTM 90.89 89.69 91.78 (Fig. 6(d)), learning rate and epoch number (Fig. 6(e)), learning rate
Validation CNN 88.73 86.79 88.07 and splitting point (Fig. 6(f)) are less important in regards to the
CNN-LSTM 92.06 93.17 92.11
model's accuracy.
Conv-LSTM 97.26 97.28 97.09
In addition, the performance of these four applied models has been
reported on the basis of different epoch numbers from 100 to 400. Fig. 7
Table 5 shows that the four models have achieved better accuracy at whatever
Diabetes prediction accuracy of different models with all features. the epoch number is when the five-fold cross validation is used instead
of keeping training and testing data separately. Fig. 7 also dictates that
Evaluation Models Accuracy Sensitivity (SN) Specificity (SP)
Techniques (ACC)
each of four models has provided the highest accuracy when the model
has been trained for about 300 epochs. The reason behind the less ac-
Separate T-LSTM 83.65 83.69 84.07 curacy of these models is underfitting and overfitting problems oc-
Training- CNN 81.47 81.25 80.98 curred when the models are trained for less than or greater than 300
Testing Set CNN-LSTM 84.31 84.26 85.04
Conv-LSTM 86.61 85.96 86.25
epochs respectively. When the overfitting problem occurs, the model
Five-fold Cross- T-LSTM 91.69 90.86 91.09 tries to memorize the data and cannot generalize the new data effi-
Validation CNN 89.69 90.08 90.41 ciently. The model performs poorly over the training and new data
CNN-LSTM 91.65 91.02 93.69 when the underfitting problem occurs.
Conv-LSTM 94.23 93.98 94.04
However, we could make a conclusion after all those discussions
that the Conv-LSTM model has classified the diabetes patients more
accurately than the other three models. The reason behind is the ar-
The score of Sensitivity and Specificity at Table 4 indicates the accuracy
chitecture of Conv-LSTM that enables the model to capture the long
of predicting the positive and negative patients of Conv-LSTM is 92.19
term data dependency with the help of an extra connection called
and 92.41 percent respectively that is higher in comparison with other
peephole, which is absent in the traditional LSTM. The combination of
three models. Besides, the results also indicate that the model based on
CNN-LSTM performs better than the individual CNN and LSTM because
Conv-LSTM has classified diabetes patients more accurately when five-
it integrates the power of both CNN and LSTM to make a final decision.
fold cross validation is used instead of keeping the training and testing
In the combination of CNN-LSTM, the CNN helps to extract the features
set separately.
and LSTM assists to remember the long data pattern efficiently that
We have also applied these model to classify diabetes patients by
make the model more capable to generalize the new data.
using all eight features such as Pregnancies, Glucose, Blood Pressure,
BMI, Skin Thickness, Insulin, Diabetes Pedigree Function, and Age of
the Pima Indian Diabetes dataset in order to find the impact of Boruta
7. Conclusions
feature selection algorithm over the performance of our model. The
results of these experiments is shown in Table 5 that dictates that all
Diabetes is caused by the unbalanced insulin secretion or its im-
these models classify diabetes patients more accurately when the sig-
paired biological impacts or both that becomes visible when the glucose
nificant features are used instead of all eight features. The highest ac-
level in blood is higher than the normal level. When the development of
curacy is 94.23 % obtained by the Conv-LSTM when all features and
a reliable machine learning model for diabetes detection is necessary,
five-fold cross validation technique are used, which is less than the
we have applied four different models such as Conv-LSTM, CNN-LSTM,
accuracy obtained by the model with significant features are used.
T-LSTM, and CNN for classifying diabetes patients over the Pima Indian
We have compared the performance of our developed Convolutional
Diabetes dataset. The novelty of this work is that the Convolutional
LSTM-based model with that of the state-of-the-art models as shown in
LSTM (Conv-LSTM) is applied for the first time in this regard as it
Table 6. Here, all referred models in Table 6 were applied over the Pima
performs well in various classification and prediction problems. We
Indian Diabetes dataset. The comparison dictates that our convolutional
have applied these models over the mentioned dataset in two different
LSTM-based model has classified diabetes patients more accurately
ways-by keeping the training and testing data separately and using the
while the decision tree and the combination of SVM and NN have
five-fold cross validation technique to evaluate the models. We have
classified with a less accuracy in compared to our model. The reason
utilized the Boruta algorithm, which is based on the random forest to
behind the higher performance of Conv-LSTM model is the ability of
select the most significant features among the eight features of the

Table 6
Performance comparison of our model with state-of-the-art models.
Model Name Accuracy (%) Reference

Deep Learning Architecture 88.41 (Ashiquzzaman et al., 2017)


ANN 90.34 (Naz and Ahuja, 2020)
Decision Tree 96.62 (Naz and Ahuja, 2020)
LSTM 86 (Massaro et al., 2019)
Recurrent Deep Neural Network (RNN) 81 (Ashiquzzaman et al., 2017)
Support Vector Machine (SVM) and Neural Network (NN) 96.09 (Gill and Mittal, 2016)
Random Forest 94 (Yuvaraj and SriPreethaa, 2019)
Naïve Bayes 91 (Yuvaraj and SriPreethaa, 2019)
Convolutional LSTM 97.26 Our Study

8
M. Rahman, et al. Computational Biology and Chemistry 88 (2020) 107329

Fig. 6. Combined effects of batch size, epoch number, learning rate, and splitting point over the accuracy.

Fig. 7. Performance observation of the applied models over different epoch numbers.

9
M. Rahman, et al. Computational Biology and Chemistry 88 (2020) 107329

database. We have performed hyperparameter optimization to find the Kotsiantis, S.B., 2013. Decision trees: a recent overview. Artif. Intell. Rev. 39 (4), 261–283
optimal values of parameters. After experimental analysis, we have Springer,29-Apr-.
Kursa, M.B., Rudnicki, W.R., 2010. Feature selection with the boruta package. J. Stat.
found that each of four models performs better when the five-fold cross Softw. 36 (11 September), 1–13.
validation technique is used instead of splitting the dataset for training Larabi-Marie-Sainte, Aburahmah, Almohaini, Saba, 2019. Current techniques for diabetes
and testing. We have also reported that the Conv-LSTM model outper- prediction: review and case study. Appl. Sci. Basel (Basel) 9 (21 October), 4604.
Li, Y., Li, H., Yao, H., 2018. Analysis and study of diabetes follow-up data using a data-
forms the other three models in both cases in terms of classification mining-Based approach in new urban area of Urumqi, Xinjiang, China, 2016-2017.
accuracy. Comparison of the developed model with the state-of-the-art Comput. Math. Methods Med. 2018.
models shows the superiority of our model. Massaro, A., Maritati, V., Giannone, D., Convertini, D., Galiano, A., 2019. LSTM DSS
automatism and dataset optimization for diabetes prediction. Appl. Sci. Basel (Basel)
9 (17 August), 3532.
Declaration of Competing Interest Mhaskar, H.N., Pereverzyev, S.V., van der Walt, M.D., 2017. A deep learning approach to
diabetic blood glucose prediction. Front. Appl. Math. Stat. 3 (July).
Muzaffar, S., Afshari, A., 2019. Short-term load forecasts using LSTM networks. Energy
The authors declares that there is no conflict of interest to any Procedia 158, 2922–2927.
person or any organization. Naz, H., Ahuja, S., 2020. Deep learning approach for diabetes prediction using PIMA
Indian dataset. J. Diabetes Metab. Disord. (April), 1–13.
Rahman, S.A., Adjeroh, D.A., 2019. Deep learning using convolutional LSTM estimates
References biological age from physical activity. Sci. Rep. 9 (1 December), 1–15.
Rahman and Siddiqui, 2019. An optimized abstractive text summarization model using
Agatonovic-Kustrin, S., Beresford, R., 2000. Basic concepts of artificial neural network peephole convolutional LSTM. Symmetry (Basel). 11 (10 October), 1290.
(ANN) modeling and its application in pharmaceutical research. J. Pharm. Biomed. Ramachandran, A., 2014. Know the signs and symptoms of diabetes. Indian J. Med. Res.
Anal. 22 (5), 717–727 Elsevier 01-Jun-. 140 (5), 579–581 Indian Council of Medical Research 01-Nov-.
Ali, Z., Hussain, I., Faisal, M., Elashkar, E.E., Gani, S., Shehzad, M.A., 2019. Selection of Ramaswami, M., Bhaskaran, R., 2009. A Study on Feature Selection Techniques in
appropriate time scale with Boruta algorithm for regional drought monitoring using Educational Data Mining. Dec. .
multi-scaler drought index. Tellus A Dyn. Meteorol. Oceanogr. 71 (1 January), Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W., Woo, W., 2015. Convolutional LSTM
1604057. network: a machine learning approach for precipitation nowcasting. Adv. Neural Inf.
Allam, F., Nossai, Z., Gomma, H., Ibrahim, I., Abdelsalam, M., 2011. A recurrent neural Process. Syst. 2015 (January), 802–810 Jun.
network approach for predicting glucose concentration in type-1 diabetic patients. Singh, P.P., Prasad, S., Das, B., Poddar, U., Choudhury, D.R., 2018. Classification of
IFIP Advances in Information and Communication Technology, vol. 363. pp. 254–259 diabetic patient data using machine learning techniques. Advances in Intelligent
AICT, no. PART 1. Systems and Computing, vol. 696. pp. 427–436.
Apoorva, S., Aditya, K., Snigdha, S.P., Darshini, P., Sanjay, H.A., 2020. Prediction of Sneha, N., Gangil, T., 2019. Analysis of diabetes mellitus for early prediction using op-
Diabetes Mellitus Type-2 Using Machine Learning. pp. 364–370. timal features selection. J. Big Data 6 (1 December).
Ashiquzzaman, A., et al., 2017. Reduction of overfitting in diabetes prediction using deep Staudemeyer, R.C., Morris, E.R., 2019. Understanding LSTM – a Tutorial Into Long Short-
learning neural network. Lecture Notes in Electrical Engineering, vol. 449. pp. 35–43. term Memory Recurrent Neural Networks. Sep. .
Clark, N.G., Fox, K.M., Grandy, S., 2007. Symptoms of diabetes and their association with Sun, Q., Jankovic, M.V., Bally, L., Mougiakakou, S.G., 2018. Predicting blood glucose
the risk and presence of diabetes: findings from the study to help improve early with an LSTM and Bi-LSTM based deep neural network. In: 2018 14th Symposium on
evaluation and management of risk factors leading to diabetes (SHIELD). Diabetes Neural Networks and Applications. NEUREL 2018.
Care 30 (11 November), 2868–2873. Swapna, G., Vinayakumar, R., Soman, K.P., 2018. Diabetes detection using deep learning
Degenhardt, F., Seifert, S., Szymczak, S., 2019. Evaluation of variable selection methods algorithms. Ict Express 4 (4 December), 243–246.
for random forests and omics data sets. Brief. Bioinform. 20 (2), 492–503. Yamashita, R., Nishio, M., Do, R.K.G., Togashi, K., 2018. Convolutional neural networks:
Fortino, V., Kinaret, P., Fyhrquist, N., Alenius, H., Greco, D., 2014. A robust and accurate an overview and application in radiology. Insights Imaging 9 (4), 611–629 Springer
method for feature selection and prioritization from multi-class OMICs data. PLoS Verlag 01-Aug-.
One 9 (9 September), e107801. Yuvaraj, N., SriPreethaa, K.R., 2019. Diabetes prediction in healthcare systems using
Gill, N.S., Mittal, P., 2016. A Computational Hybrid Model With Two Level Classification machine learning algorithms on Hadoop cluster. Cluster Comput. 22 (1
Using SVM and Neural Network for Predicting the Diabetes Disease. January), 1–9.
Guo, D., Zhang, D., Li, N., Zhang, L., Yang, J., 2010. Diabetes identification and classi- Zhang, Y., 2012. Support vector machine classification algorithm and its application.
fication by means of a breath analysis system. Lecture Notes in Computer Science Communications in Computer and Information Science, vol. 308. CCIS, pp. 179–186
(Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in no. PART 2.
Bioinformatics), vol. 6165. LNCS, pp. 52–63. Zhu, G., et al., 2019. Redundancy and attention in convolutional LSTM for gesture re-
Kamble, A.K., Manza, R.R., Rajput, Y.M., 2016. Review on Diagnosis of Diabetes in Pima cognition. IEEE Trans. neural networks Learn. Syst.(June).
Indians. undefined. . Zou, Q., Qu, K., Luo, Y., Yin, D., Ju, Y., Tang, H., 2018. Predicting diabetes mellitus with
Karabulut, E.M., Özel, S.A., İbrikçi, T., 2012. A comparative study on the effect of feature machine learning techniques. Front. Genet. 9 (November).
selection on classification accuracy. Procedia Technol. 1 (January), 323–327.

10

You might also like