You are on page 1of 5

Bonfring International Journal of Data Mining, Vol. 4, No.

2, June 2014 7
ISSN 2277 - 5048 | © 2014 Bonfring
Abstract--- Diabetes mellitus (DM) is a chronic, general,
life-threatening syndrome occurring all around the world. It is
characterized by hyperglycemia occurring due to
abnormalities in insulin secretion which would in turn result
in irregular raise of glucose level. In recent years, the impact
of Diabetes mellitus has increased to a great extent especially
in developing countries like India. This is mainly due to the
irregularities in the food habits of several IT professionals.
Thus, early diagnosis and classification of this deadly disease
has become an active area of research in the last decade. A
number of techniques have been developed to deal with his
disease. Numerous clustering and classifications techniques
are available in the literature to visualize temporal data to
identifying trends for controlling diabetes mellitus. This
survey presents an analytical study of several algorithms
which diagnosis and classifies Diabetes mellitus data
effectively. The existing algorithms are analyzed thoroughly to
identify their advantages and limitations. The performance
evaluation of the existing algorithms is carried out to
determine the best approach. A best approach among the
existing approach is determined and a solution is also
suggested to improve the overall performance of diagnosis
process.
Keywords--- Data Mining, Diabetes Mellitus, Clustering,
Classification

I. INTRODUCTION
he occurrence of diabetes mellitus with certain
complications has been utilized to determine the
diagnostic cut-points for diabetes, especially using data from
epidemiological investigations which have resulted in several
other disorders [1,2].
The major symptoms and effects of diabetes mellitus
include long-term injury, dysfunction, and functional
abnormalities in eyes, kidneys, nerves, heart, and blood
vessels [3]. Thus, it is considered as one of the most important
health issue and economic costs in various countries like
United States, India, etc [4].
Diabetes occurrence and predominance in the United
States is well understood at the circumstances [5, 6].
Recognizing a type of diabetes to an individual is mostly
based on the situation at the time of diagnosis. It is observed
that most of the diabetic patients do not just fix into a single

S. Peter, CEO and Founder, Softwings Technologies, Coimbatore, India.

category [7]. The three major categories of diabetes mellitus
includes
Type 1 Diabetes: This type of abnormality occurs due
to the malfunction of the body to generate insulin.
Type 2 diabetes: This type of class results from insulin
resistance, where the cells could not employ insulin in
proper proportion.
Gestational diabetes: This occurs when pregnant
women, without diabetes but having high blood
glucose level during pregnancy. It may possibly lead
to the development of type 2 DM.
In recent years, the number of diabetic patients has
increased drastically mainly due to the aging population and
irregular western food habits. Genetic inheritance is the main
reason for the cause type 1 and 2 diabetes classes.
The main aim of treating diabetes is to control the sensitive
complications of diabetes, and to eliminate the chronic
complications of diabetes. For effective diagnosis of diabetes,
the main factor that has to be considered is the risk of diabetic
complications, early and accurately.
Recently, a novel classification and diagnostic criteria
have been presented by the American Diabetes Association
(ADA), World Health Organization (WHO) and Japan
Diabetes Society (JDS). This survey illustrates the major
points of the report of the Committee on the classification and
diagnostic criteria of diabetes mellitus.
The investigation has strongly suggested that the new and
efficient diagnostic and classification techniques have to be
adapted for early and efficient diagnosis of diabetes mellitus.
The new diagnostic techniques should have all the validated
evaluations.
Numerous techniques have been developed for the
diagnosis of diabetic mellitus. Most of the techniques used
clustering and classification for the effective diagnosis of the
diabetic mellitus disease. But, there is always a scope for
improvement and still several techniques are being developed
to overcome the limitations of the existing techniques.
This paper presents an analytical study on the existing
techniques available for diabetes mellitus. The characteristic
features of the approaches are investigated to develop a better
approach for the early and efficient diagnosis of the disease.



An Analytical Study on Early Diagnosis and
Classification of Diabetes Mellitus
S. Peter


T
Bonfring International Journal of Data Mining, Vol. 4, No. 2, June 2014 8
ISSN 2277 - 5048 | © 2014 Bonfring
II. LITERATURE SURVEY
This section discusses about the existing techniques and
algorithms used for the diagnosis of diabetes mellitus. Each
and every algorithm used for the diagnosis of diabetes mellitus
has their own limitations and advantages. This section presents
an analytical study on the features of the existing techniques.
A. Neural Network for Diagnosis Diabetes Mellitus
Jaafar and Ali [8] presented a technique for the diagnosis
of diabetes mellitus using ANN. The authors have presented
an investigation which could assist the medical personalities in
finding the status of the diabetic mellitus disease. This work
uses neural network for the diagnosis purpose. The back-
propagation neural network algorithm has been utilized for
learning and testing of 768 data in which 268 of them are
diagnosed with diabetes. The inputs given to the network
includes pregnancy numbers, plasma glucose concentration,
BP, skin fold thickness, serum insulin, BMI, diabetes pedigree
function and age. This work clearly shows and identifies a
person with diabetic mellitus. It is observed from the results
that the neural network approach provides significant results
with higher accuracy.
Caipo Zhang et al [9] used fuzzy based structure as the
diagnostic framework for gestational diabetes mellitus. The
Sugeno model is attained through training of back propagation
neural network. The main limitation of back propagation
neural network is that, it will get into the local optimum very
soon, so simulated annealing is used to optimize the back
propagation neural network, and this will result in
approximate global optimal solution. In this diagnostic
framework, Adaptive neuro fuzzy inference system (ANFIS)
has been used instead of Radial Basis Function (RBF) neural
network. Back propagation has been used for training the
ANFIS structure. The results show the efficiency of ANFIS
when compared with conventional neural networks.
In recent years, several research works have been carried
out to treat Type-1 diabetes through a closed-loop insulin
delivery framework. The main goal of this work is to analyze
the utilization of a brain-inspired neural fuzzy system as a
controller to provide insulin in a closed-loop model for the
diagnosis of Type-1 diabetes mellitus. Phee et al [20]
presented the Pseudo-Outer Product based Fuzzy Neural
Network using the Yager rule of inference to give out the
suitable amount of insulin in the presence of varying meal
disturbances to achieve normoglycemia for a simulated Type-
1 diabetic patient.
B. Vector Machine for Diagnosis Diabetes Mellitus
Stoean et al [10] presented a new approach of Evolutionary
Support Vector Machines (ESVMs) for binary classification of
diabetes mellitus. ESVMs are formulated through
hybridization between the efficient learning model of SVM
and the optimization power of evolutionary computation.
Hybridization is attained at the level of handling the
constrained optimization issue within the SVMs, which is a
tough job to carry out. The experiments are carried out on the
benchmark problem concerning diabetes of the UCI repository
of machine learning data sets. It is observed from the results
that the proposed hybrid model shows good results with higher
classification accuracy.
Nahla H Barakat [11] utilized SVM for the diagnosis of
diabetes. This work uses an additional intelligent module,
which transforms the ―black box‖ model of SVM into an
intelligent SVM's diagnostic model with adaptive results. It is
observed from the results that the intelligent SVMs provide a
potential framework for the prediction of diabetes, where a
logical rule set have been generated, with prediction accuracy
of 94%, sensitivity of 93%, and specificity of 94%.
Furthermore, the extracted rules are medically sound and
agree with the outcome of relevant medical studies.
C. Clustering and Hybrid Approaches for Diagnosis Diabetes
Mellitus
A novel hybrid integration of Iterative Learning Control
(ILC) and Model Predictive Control (MPC) forms a Model
Predictive Iterative Learning Control (MPILC) for glycemic
control in type 1 diabetes mellitus as discussed by Youqing
Wang et al [12]. MPILC uses two key factors such as frequent
glucose readings and the repetitive feature of glucose-meal-
insulin dynamics with a 24-h cycle. This algorithm formulates
based on individual's lifestyle, facilitating the control
performance to be improved.
Hemant and Pushpavathi [14] presented a novel approach
for diagnosing and classifying the diabetes mellitus disease.
Initially K-means clustering approach is used to group the
disease related data into clusters and assigns classes to
clusters. Then, various classification approaches are trained on
the result set to generate the final classifier framework based
on K-fold cross validation. This approach is validated using
768 raw diabetes data. The proposed approach helps clinical
professionals in their diagnosis judgments and also in their
treatment procedures for different class of diabetes mellitus.
Nirmala Devi et al [13] presented a fusion model that
integrates k-means clustering and k-Nearest Neighbor (KNN)
with multi-sep preprocessing. It is observed that KNN
algorithm provides significant performance on various data
sets. In this fusion model, the quality of the data is improved
through eliminating noisy data thus improving the accuracy
and efficiency of the KNN algorithm. K-means clustering is
determines and avoids incorrectly classified instances. An
efficient classification is carried out through KNN by taking
the correctly clustered samples with preprocessed subset as
inputs for the KNN. The finest choice of k is based on the
data. The main goal of this work is to identify the value of k
for PIDD for better classification accuracy using fusion based
KNN. It is observed from the results that this fusion work
based on KNN along with preprocessing provides best result
for different k values. If k value is more, the classification
accuracy of the proposed fusion framework is 97.4%. The
results are also compared with simple KNN and cascaded K-
MEANS and KNN for the same k values.
A hybrid framework which integrates clustering and
classification to obtain higher accuracy result is presented. In
the earlier approaches, hybrid classification approaches have
been developed to predict different types of diabetes through
analyzing the patients’ profiles [15]. This information is useful
Bonfring International Journal of Data Mining, Vol. 4, No. 2, June 2014 9
ISSN 2277 - 5048 | © 2014 Bonfring
in diagnosing a diabetic patient. Hybrid Prediction Model
(HPM) using K-means clustering algorithm integrated with a
C4.5 classifier is presented by Patil et al [16]. This framework
is trained on patients’ characteristics and measurements to
inquire on the diabetic diagnosis. K-fold cross validation
approach has been used in Decision Tree C4.5 Classifier and it
is found that the accuracy is increased to 92.4% when
compared with the C4.5 algorithm. The recent hybrid
framework includes K-Means clustering and Decision Tree
C4.5 classifier. This framework clearly classifies the patients
with high risk of diabetic mellitus and the people with lesser
probability in getting diabetic mellitus [16]. Using the same
dataset, which consists of 392 cases with no missing values,
the proposed cascaded model obtained the classification
accuracy of 93.3 % when compared to accuracy of 73.6 %
using C4.5 classifier alone.
Norul Hidayah Ibrahim et al [17] presented a novel hybrid
framework by exploiting Agglomerative Hierarchical
Clustering and Decision Tree Classifier on Pima Indians
Diabetes dataset. This work presents a comparative analysis of
the performance accuracy of the Decision Tree Classifier
against the same classifier augmented with Hierarchical
clustering. It is observed that the hybrid model achieved
higher accuracy with 80.8% when compared to 76.9% of the
conventional model. The results showed the potential
significance of the hierarchical clustering in a rule-based
classifier.
Table 1: Summary of the Existing Approaches
Various
Approaches
Results Obtained Features
Neural Network Provides
significant results
with higher
accuracy
It requires more
time to process,
Large complexity
of the network
structure etc
Support Vector
Machine
Higher
classification
accuracy
SVMs do not
directly provide
probability
estimates, so
these must be
calculated using
indirect
techniques
Hybrid
Approaches
Very high
classification
accuracy and
Prediction
Accuracy
It is very to
simple to
implement,
Robust to noisy
training data etc
III. COMPARATIVE ANALYSIS OF THE EXISTING
APPROACHES
The performance evaluation of the above discussed
approaches is based on certain performance metrics. The
diagnostic model of diabetes mellitus of the data is evaluated
using three metrics like classification accuracy and processing
time.


Table 2: Methods to Diagnostic the Diabetes Mellitus
APPROACHES CLASSIFICATION
ACCURACY
PROCESSING
TIME IN
(SEC)
CAIPO ZHANG ET AL ET
AL.(2009)
75 0.964
NAHLA H BARAKAT,ET
AL., (2010)
94 0.856
NIRMALA DEVI ET
AL.,(2013)
97.4 0.753
NORUL HIDAYAH
IBRAHIM ET AL., (2013)
81 0.8


Figure 2: Comparison of Methods to Diagnostic the Diabetes
mellitus in Terms of Processing Time
0
20
40
60
80
100
120
Caipo
Zhang ET
AL ET
AL.(2009)
Nahla H
Barakat,et
al., (2010)
Nirmala
Devi et
al.,(2013)
Norul
Hidayah
Ibrahim et
al., (2013)
C
l
a
s
s
i
f
i
c
a
t
i
o
n

A
c
c
u
r
a
c
y

i
n

(
%
)
_
Various Approches
Classification Accuracy
Classification
Accuracy (%)
0
0.2
0.4
0.6
0.8
1
1.2
Caipo Zhang
ET AL ET
AL.(2009)
Nahla H
Barakat,et
al., (2010)
Nirmala Devi
et al.,(2013)
Norul Hidayah
Ibrahim et
al., (2013)
P
r
o
c
c
e
s
s
i
n
g

T
i
m
e

i
n

s
e
c
Various Approches
Processing Time in (sec)
Processing Time in (sec)
Bonfring International Journal of Data Mining, Vol. 4, No. 2, June 2014 10
ISSN 2277 - 5048 | © 2014 Bonfring

A. Performance Comparison
The performance of the various algorithms is evaluated
based on the parameters like
Convergence behavior
Processing Time
Classification Accuracy
a. Convergence Behavior

Figure 3: Comparison of Convergence Behavior
Figure 3 shows the comparison of the convergence
behavior of the Decision tress, neural networks and Clustering
Techniques. It is observed from the figure that the clustering
Approaches and hybrid approaches converge in lesser
iterations when compared with the other techniques. For
instance, the decision trees approach takes 90 iterations for
convergence; neural network Algorithms approach takes 70
iterations where as the clustering approaches takes 40
iterations for convergence, hybrid approaches takes 30
iterations to convergence. Thus the hybrid approaches are very
significant when compared with the other optimization
approaches taken for consideration.
b. Processing Time
Table 3 shows the performance comparison of the
techniques such as C4.5 Classifier, Neural networks and
Clustering Techniques.
c. Classification Accuracy
Table 3: Performance Comparison of the various Techniques
using Classification Accuracy
Techniques Classification
Accuracy (%)
C4.5 Classifier [18] 92.38
Neural Networks [8] 80.88
Clustering Techniques [3] 93.67
Hybrid Approach [15] 96.68


Figure 4: Comparison of Classification Accuracy
It is observed from the figure 4, that the clustering
Approaches performs better when compared with other
techniques.
IV. CONCLUSION
Diabetes mellitus is a deadly disease which is one of the
chief public health challenges around the world. It is a fact
that 80% of type 2 diabetes complications can be prevented by
early identification of people at risk. Numerous machine
learning techniques have been used for the diagnosis and
classification of diabetes mellitus. Early detection of this
disease has become an essential issue to improve the overall
clinical efficiency of the diagnosis process. This paper clearly
presents an analytical study of numerous algorithms which
includes clustering, classification, vector machines and neural
networks. An analytical result has been validated for the
approaches such as clustering, neural network, vector
machines and hybrid approaches. It is observed that the
hybrid approaches are observed to produce significant results
in terms of the classification accuracy, processing time, etc.
This survey would help clinical assistance to doctors for
earlier diagnosis of the dreadful disease.
REFERENCES
[1] K.H. Anders. A Hierarchical Graph-Clustering Approach to find Groups
of Objects. Proceedings 5th ICA Workshop on Progress in Automated
Map Generalization, IGN, Paris, France, 28{30 April, 2003.
[2] Centers for Disease Control and Prevention. National diabetes fact sheet:
national estimates and general information on diabetes and prediabetes
in the United States, 2011. Atlanta, GA: U.S. Department of Health and
Human Services, Centers for Disease Control and Prevention. (Accessed
April 5, 2011).
[3] D. Vijayalakshmi, K. Thilagavathi, ―An Approach for Prediction of
Diabetic Disease by Using b-Colouring Technique in Clustering
Analysis‖, International Journal of Applied Mathematical Research, 1
(4) (2012) 520-530.
[4] American Diabetes Association. Economic costs of diabetes in the U.S.
in 2007. Diabetes Care 2008; 31: 596-15.
1
1.5
2
2.5
3
3.5
4
0 10 20 30 40 50 60 70 80 90 100110120
O
b
j
e
c
t
i
v
e

F
u
n
c
t
i
o
n
Iterations
Decision
Trees
Neural
Networks
Clustering
Techniques
Hybrid
Approach
70
75
80
85
90
95
100
C4.5
Classifier
Neural
Networks
Clustering Hybrid
Approach
C
l
a
s
s
i
f
i
c
a
t
i
o
n

A
c
c
u
r
a
c
y

(
%
)
Approaches
Classification Accuracy (%)
Bonfring International Journal of Data Mining, Vol. 4, No. 2, June 2014 11
ISSN 2277 - 5048 | © 2014 Bonfring
[5] Danaei G, Friedman AB, Oza S, Murray CJ, Ezzati M. Diabetes
prevalence and diagnosis in US states: analysis of health surveys. Popul
Health Metr 2009; 7: 16.
[6] Gregg EW, Kirtland KA, Cadwell BL, et al. Estimated county level
prevalence of diabetes and obesity—United States, 2007. Morb Mortal
Wkly Rep 2009; 58(45): 1259-63.
[7] Alberti, K. G., and Zimmet, P. Z., Definition, diagnosis and
classification of diabetes mellitus and its complications, part 1: diagnosis
and classification of diabetes mellitus provisional report of a WHO
consultation. Diabet. Med. 15(7):539–553, 1998
[8] Jaafar, S.F.B. and Ali, D.M., ―Diabetes mellitus forecast using artificial
neural network (ANN)‖, IEEE, 2005.
[9] Caipo Zhang, Jinjie Song ; Zhilong Wu, ―Fuzzy Integral be Applied to
the Diagnosis of Gestational Diabetes Mellitus‖, IEEE, 2009.
[10] Stoean, R. and Stoean, C. ; Preuss, M. ; El-Darzi, E., ―Evolutionary
Support Vector Machines for Diabetes Mellitus Diagnosis‖, IEEE, 2006.
[11] Nahla H Barakat, Andrew P Bradley, Mohamed Nabil H Barakat,
―Intelligible support vector machines for diagnosis of diabetes mellitus‖,
IEEE transactions on information technology in biomedicine: a
publication of the IEEE Engineering in Medicine and Biology Society
(Impact Factor: 1.69). 07/2010; 14(4):1114-20.
[12] Youqing Wang ; Dassau, E. ; Doyle, F.J., ―Closed-Loop Control of
Artificial Pancreatic -Cell in Type 1 Diabetes Mellitus Using Model
Predictive Iterative Learning Control‖, IEEE, 2010.
[13] Nirmala Devi, M.; Appavu, S. ; Swathi, U.V., ―An amalgam KNN to
predict diabetes mellitus:, IEEE, 2013.
[14] Hemant, P. and Pushpavathi, T., ―A novel approach to predict diabetes
by Cascading Clustering and Classification‖, IEEE, 2012.
[15] A. G. Karegowda, M. A. Jayaram, and A. S. Manjunath, ―Cascading K-
means clustering and K-Nearest Neighbor classifier for categorization of
diabetic patients,‖ International Journal of Engineering and Advanced
Technology, 1(3):2249 – 8958, 2012.
[16] B. M. Patil, R. C. Joshi, and D. Toshniwal, ―Hybrid prediction model for
type-2 diabetic patients,‖ Expert Systems with Applications vol. 37, p.
8102–8, 2010.
[17] Norul Hidayah Ibrahim, Aida Mustapha, Rozilah Rosli, Nurdhiya
Hazwani Helmee, ―A Hybrid Model of Hierarchical Clustering and
Decision Tree for Rule-based Classification of Diabetic Patients‖,
International Journal of Engineering and Technology (IJET) Vol 5 No 5
Oct-Nov 2013.
[18] S.Priya, R.R.Rajalaxmi, ―An Improved Data Mining Model to Predict
the Occurrence of Type-2 Diabetes using Neural Network‖, International
Conference on Recent Trends in Computational Methods,
Communication and Controls (ICON3C 2012).
[19] M.Ramaswamy, D.Anitha, S.Priya Kuppamal, R.Sudha, S.Fepslin
Athish Mon, ―A Study and Comparison of Automated Techniques for
Exudate Detection Using Digital Fundus Images of Human Eye: A
Review for Early Identification of Diabetic Retinopathy‖, Int. J. Comp.
Tech. Appl., Vol 2 (5), 1503-1516.
[20] Phee, H.K. ; Tung, W.L. ; Quek, C., A personalized approach to insulin
regulation using brain-inspired neural sematic memory in diabetic
glucose control ―,IEEE Congress on Evolutionary Computation, 2007.
CEC 2007.

Mr S. Peter is the CEO and founder of Softwings
technologies, India’s pioneering provider of Software
development, Web development, Multimedia and
market research operations to the global software
industry.He is a graduate from the Anna University,
where he graduated with a Master of Science Degree in
Software Engineering. His areas of interest include
Cloud computing, Software Engineering and Data
Mining. He is a charter member of Youth Entrepreneurs of Coimbatore.