Analysis and Prediction of Diabetes Mell

Analysis and prediction of diabetes Mellitus using machine learning
Algorithm
Minyechil Alehegn Rahul Joshi
Department of Computer science and Engineering
Symbiosis international university, Pune, Maharashtra, 412115, India
(minyechil.tefera,rahulj)@sitpune.edu.in
Abstract within the distance of light like Indian countries
and some Saharan countries.
Data mining techniques (DMTs) are very help full
to predict the medical datasets at an early stage to According to the WHO (world health organization)
safe human life. Large amount of medical datasets report in November 14, 2016 in the world diabetes
are open in different data sources which used to in day “Eye on diabetes” reported 422 million adults
the real world application. One of the duties of are with diabetes, 1.6 million deaths, as the report
machine learning method is a prediction on disease indicates it is not difficult to guess how much
data. Currently, Diabetes Disease (DD) is the diabetes is very serious and chronic. Diabetes
leading cause of death over all the world. To cluster diseases type II is treatable but type I is not
and predict symptoms in medical data, various data treatable so it is so difficult to treat patient.
mining techniques were used by different
In 2014, 8.5% of adults whose ages 18 and more
researchers in different time. A total of 768 records,
than 18 had diabetes mellitus. By 2012 blood
data set from PIDD (Pima Indian Diabetes Data
glucose was one of the causes of 2.2 million
Set) which is access from online source. In the
population death [40]
proposed system most known predictive algorithms
are applied SVM, Naïve Net, DecisionStump, and According to the centers for disease control and
Proposed Ensemble method (PEM). An ensemble prevention (CDCP) they give information for the
hybrid model by combining the individual duration of 9 ensuing years that is between 2001
techniques/methods into one we made Proposed and 2009 type II diabetes increased 23% in the
Ensemble method (PEM). The proposed ensemble United States (US). There are different countries,
method (PEM) provides high accuracy of 90.36% organization, and different health sectors worry
about this chronic disease control and prevent
Keywords: collaborative ; Diabetes;
before the person died that means the early
classification; Machine learning; Data mining;
presentation of diabetes in order to save human
SVM,; Naïve Net; Decision Stump; PEM
life. Eating is also one factor for diabetes diseases
1. INTRODUCTION and also, exercise used for healthy even a person
live with diabetes the patient can recover from the
Currently in a global world, there are so many
disease by doing exercise
chronic diseases are distributed throughout the
world, both in the developing and developed Diabetes diseases have the power or ability to
country such serious disease are distributed. From damage different parts of the human being body,
those serious diseases, Diabetes mellitus is one of from those human body parts which are affected
the chronic diseases in the world which cut human by diabetes are listed as follow:-human heart,
life at early age. Diabetes Mellitus (DM) gets its human eye, human kidney, and human nerves. As
name by health professionals .There were many it indicates it is easy to guess how much it is
people died by this serious diseases in the world. chronic and dangerous diseases that shorts human
At this time diabetes disease increases rapidly life. . Tao et al. [2] Algorithms which are used in
machine learning have various power in both
classification and predicting. Saba et al. [12] there Svm. From those three different algorithm svm
is no single technique gives better performance provided higher accuracy and performance than
and accuracy for all diseases, whereas one other method.
classifier provides or shows highest performance
Tao et al. [2] KNN, Naïve Bayes, Random Forest,
in a given dataset, another method or approach
decision tree, swim, and logistic regression was
outdoes the others for other diseases. The new
applied for the prediction purpose of diabetes
study or the proposed study concentrate on a novel
mellitus (DM) at early stage. Filtering criteria
combination or hybridization of different
improved and the accuracy of recall was provided
classifiers for diabetes Mellitus (DD)
well than other accuracy measurement techniques.
classification and prediction, thus overcoming the
Concentrated on filtering.
problem of individual or single classifiers.
Yunsheng et al. [4] KNN and DISKR was used
The new proposed study follows the different
and storage space was reduced, an instance which
machine learning techniques (MLTs) to predict
has less factor was eliminated. Removing of
diabetes Mellitus (DM) at an early stage to save
outlier increase both performance and accuracy.
human life. Such algorithms are SVM, Naïve Net,
Decision Stump, and PEM to predict and increase Swarupa et al. [7] various dataset was used,
the prediction accuracy and performance. including diabetes data set. Any cross validation
technique was not applied for evaluation purpose.
2. RELATED WORK
Naive Bayes(NBs) was providing good accuracy
Song et al. [5] various algorithim was explained with the accuracy value of 77.01%.
using different parameters such as Glucose, Blood
Sajida et al. [9] Bagging, Adaboost, and j48 were
Pressure (BP), Skin Thickness (ST), insulin, Body
applied in the prediction of diabetes. Adaboost
max index (BMI), Diabetes Pedigree
provided better performance and accuracy.
function(DPF), and age. All parameters were not
Pradeep & Dr. Naveen [8] J48 is one of the most
included. Only Small sample data used. ANN,
popular and noted as better accuracy as well as
EM, GMM, Logistic regression, and SVM were
good performance in this study. Feature selection
applied on diabetes dataset .ANN (artificial neural
was applied in order to increase the accuracy and
network) was provided better accuracy and
performance. Pradeep et al. [11] J48 machine
performance than other algorithm.
learning algorithm provided better performance
Xue-Hui Meng et al. [42] use different data and accuracy.
mining techniques to predict the diabetic diseases
Santhanam and Padmavathi [10] K-means,
using real world data sets by collecting
Genetic Algorithm, and SVM were applied and
information by distributing questioner. SPSS and
increase the accuracy value.
weka tools were used for data analysis and
prediction respectively .The following algorithm Xue-Hui Men et al. [13] the comparison between
were compared ANN, Logistic regression, and algorithm was done J48, Logistic Regression, and
j48. J48 technique provided an efficient and better KNN.From those algorithm J48 was provided
accuracy. high performance. Croatia et al. [14] k-nearest
neighbour (KNN) was applied and provided
Weifeng Xu et al. [3] Different machine learning
accuracy value of 70% accuracy. Ramiro et al. [6]
algorithm was applied in the prediction of diabetes
wrong treatment can reduced by applying fuzzy
diseases from those Naïve Bayes (NBs), Random
rule mechanism and also it helps for doctors as a
forest (RF). ID3, and Adaboost were done. From
recommender system in order to treat the patient
those algorithm RF was provided better accuracy
without making mistake. Saba et al. [12] different
than other data mining techniques.
data mining algorithm was applied from those
Loannis et al. [1] 10 fold cross validation was algorithm Meta classifier provided higher
used as evaluation method in three different accuracy than single classifier.
algorithms s Logistic regression, Naïve Bayes, and
3. METHODOLOGY
In diabetic disease there were different research were done.sumarry of common or major findings are
given as follow.
Table I Summary of major findings or discoveries of diabetes prediction methodologies
Sn Author s Methodologies Finding
1 Tao et al.[2] KNN,Naïve Bayes, Decision Concentrated on the accuracy of recall and
Tree, Random Forest, SVM got better result. Filtering criteria can be
and Logistic Regression improved
2 Loannis et al.[1] Naïve Bayes, From the three algorithm Svm provided
Logistic regression ,and Svm high accuracy of 84%
3 Weifeng Xu et ID3 ,Naïv Bayes,Random Random forest classifier method better
al.[3] forest,Adaboost relative to other .in contrast ID3 provided the
least accuracy than others.
4 Yunsheng et al. DISKR and KNN An attribute which have less factor should be
[4] eliminated. Accuracy increase can be
increase by removing outliers. Space
complexity decreased.
5 Messan et al.[5] GMM, ELM , ANN Less amount of sample data used.
LR, and SVM Comparison of algorithm were done from
those method artificial neural network
provide better accuracy than other classifier.
6 Ramiro et al.[6] Fuzzy rule Wrong treatment was reduced using fuzzy
rule and recommendation system was
developed for doctor.
7 Swarupa et ,KNN,J48, ANN,zeroR, Various dataset applied containing diabetes
al.[7] NB,cv parameter selection, dataset. Cross validation not applied. NB
Filtered classifier and simple shown high accuracy by providing accuracy
cart of 77.01%.
8 Pradeep & Decision tree(J48) J48 is noted as good accuracy provider
Dr.Naveen [8] algorithm. Feature selection has high role in
the prediction area.
9 Sajida et al.[9] Adaboost, j48,and Bagging, Adaboost was shown improved accuracy
than other method.
10 Santhanam and K-means with Genetic The integrated clustering and classification
Padmavathi[10] Algorithm ,and SVM of algorithm done and provided better
performance.
11 Pradeep et KNN, J48,SVM, and J48 provided efficient accuracy by providing
al.[11] Random Forest 73.82% accuracy than others before pre-
processing. Opposite side KNN and RF in
provided good accuracy after pre-processing.
12 Saba et al.[12] CART ,C4.5 ,Bagging, and The given algorithm applied on two diabetic
ID3 datasets.
13 Xue-Hui Men et KNN, Logistic Regression, 78.27% accuracy was measured using this
al.[13] and J48 method.
14 Krati et al.[14] KNN 70% and 57% accuracy measured in data
tes1 and data test2 respectively.
15 Saravananatha SVM,CART, j48, cart, svm and knn was applied and
n and KNN,andJ48 shown with the accuracy value of 67.15%,
velmurugan[15] 62.28, 65.04 and 53.39 respectively.
16 Yang et al.[16] NB, Bayes network. 72.3% accuracy measured by Bayes network
17 Asma [17] Decision tree Was shown 78.1768% accuracy.
18 Anjli and SVM 72% accuracy measured
Varun[18]
19 Thirumal et C4.5,SVM,KNN, and Naïve C4.5 was shown improved accuracy than
al.[19] Bayes other with accuracy value of 78.2552%
20 Ayush and CART Was provided accuracy of 75%
Divya[20]
21 Veena and SVM, Decision Stump,NB, 80.72% accuracy was measured as better by
Anjali[21] and decision tree Decision stump
22 Anuja and SVM Better performance shown with the accuracy
Chitra[22] value of 78% by this technique.
23 Prajwala[23] DT and RF Random forest was show better performance
than decision tree
24 Bum et al.[24] NB ,Logistic regression, and Focused on prediction of Fasting Glucose
,Anthropometry Level. 74.1% performance and accuracy
measured by anthropometry.
25 Aruna and fuzzy rule, GA, and KNN, Some rule was generated.
Nazneen[25]
26 Sakorn[26] Expert system and fuzzy rule Focused on Expert system and was
developed for treatment purpose.
27 Seokho et SVM ,E2_SVM 80 % accuracy measured as better by using
al.[28] E2_SVM
28 Emrana et KNN,C4.5 C4.5 provided more accuracy of 90.43 %
al.[11] and KNN provided accuracy of 76.96%
29 Kamadi et DT, Gini index, Gaussian DT model shown good and efficient
al.[30] fuzzy function accuracy than other methods.
30 Munaza J48,Naïve Bayes ,and RF RF provided better accuracy than J48 and
Ramzan[29] Naïve Bayes in 10 cross validation
Evaluation method.
31 Patil et al.[32] HPM 92.38% accuracy recorded using HPM.
32 Abdullah et Support vector machine effective treatment of prediction was done
al.[31] using this technique.
33 Mounika et ZeroR,NB, and oneR Effective treatment was applied on young
al.[32] and old patient. NB was better performance
than others method
34 Nongyao and LR, Boosting, Naïve Bayes, 85.558% accuracy was measured using this
Rungruttikarn[3 ANN, Bagging, and Decision Random Forest technique.it is recorded as
4] tree. better accuracy
35 Amit and RF,MLP,C4.5,and Bayes Net The combination of MLP+BayesNet were
Pragati [36] shown better accuracy of 81.89% and better
than other classification algorithm
36 Saba et al.[35] NB,HMV,RF,Adaboost, Focused on various diseases including
KNN, LR,and SVM diabetes studied .78.085% accuracy
measured by HMV algorithm it is recorded
as better accuracy
37 Rian and Fuzzy Rule Different rule was generated which helps for
Irwansyah[37] early detection of diabetes.
3.1 .Data pre-processing Methods Steps
The raw data that we want to use for the analysis Step1:-First we must identify the right hyper
and prediction purpose should be safely collected, plane
joined and prepared for investigation [13].The
dataset for this proposed system obtained from Step2:-After the first step the second step is
public UCI repository PIDD (Pima Indian maximizing the distances between neighbour data
point
Diabetes Database) which is open and freely
available online [42] .we use this open and free Step3:- Add a feature z=x^2+y^2.it indicates that
online available dataset for analysis and prediction svm solves such problem.
of diabetes mellitus. The data set which collected
or obtained from online consists 768 records and Step4:-Apply Svm classifier to classify the class
.the class is binary
eight attributes with one main class and the class
is binary.in the proposed study Weka 3.8.1 and Naïve nets (NNs)
java used for analysis, classification, and
prediction. In addion to this we use hybrid model The time complexity of this technique is short
with base learner in order to increase the .computes based on possibility by using the
performance and accuracy. probability formula. As the name indicates in this
classification method there is no independence of
3.2 Classification and prediction features. It used to maximize the probability of
Methods (C|F)
In the propped study, different parameters are Means that maximization =PR (class | feature)
used for as input parameters such as insulin,
pregnancies, Blood Pressure (BP), skin thickness, Steps
Glucose, Body mass index (BMI), Diabetes Step1: The data should be convert into frequency
pedigree Function, and Age. Different Data table
mining methods and statistical techniques that can
used to predict diabetes diseases. Based on the Step2: Find likelihood
extent literature, we established on employing
Step3:-In third step use naïve Bayes equation.
four most known prediction algorithm such as
Here the prediction is done.
Support vector machine (SVM), Naïve Net (NN,
and DecisionStump (DS) classification algorithm (C|F) means PR (class | feature)
and combined the prediction of them in to one to
Decision Stump (DS)
increase the prediction accuracy of the algorithm
using base learner. Classification method and It is one of the most popular machine learning
their unique requirements applied in this research classification algorithm that used in single level
study is described as follow. impute value .most of the time it is appropriate
for an ensemble method specially in boosting that
Support Vector Machine (SVM)
is one of the reason .
Support vector machine algorithm is one of the
Collaborative (Ensemble) model
most popular and widely used machine learning
techniques. This algorithm also known as binary In prediction purpose individual prediction
approach algorithm because it used for binary algorithms are not provided better and efficient
classification like present or absence, on or off, performance. So, in order to increase the accuracy
normal or abnormal, impactful or none impact and performance better needs to make the
full.in this study it used for the prediction of the prediction of those individual prediction algorithm
diabetes that is diabetes or non-diabetic.it is used in to one by combining the prediction of single
by maximizing the margin of the distance between classifier. The collaborative approach solves the
the variables in the hyper plane . It is used for both limitation of distinct classifiers to cop up the
regression and classification purpose. accuracy better by combining in to one. [12, 32]
Accuracy
Dataset
It measures the performance degree in the
percentage format. Accuracy can calculate in
statistical measure as follow
Data pre-process Pre processing
means:-
correctly claccified
-Fill missing value
-Remove �� = ∗ correctly classified + incorrectly classified
Training duplication
set -data conversion
Machine learning Algorithm Table 1: The predictive accuracy (in percentage

%) of the individual classification and ensemble
SVM Naïve nets
Decision Stump method for PIDD
Validation Classification Accuracy Incorrectly

method
-10 fold cross
Algorithm (%) classified
Testing Set
validation (%)
SVM 88.8 11.2
SVm Naïve net Decision Stump
prediction Prediction prediction Evaluation Bayes Net 88.54 11.46
method
-Accuracy DecisionStumb 83.72 16.28
-Error rate
Compare individual prediction
AdaBoostM1 85.68 14.32
Proposed 90.36 9.64
comethod(PM)
Meta classifier
Final prediction
Fig 1: Proposed Work Flow
Results
Data collection
The proposed system evaluate by using Pima
Indian Diabetes Dataset(PIDD) from public data
which is available online in UCI data repository
data collected from .the data set contains 768
records with eight attribute and one class .The
attributes are insulin, pregnancies, Blood
Pressure(BP), skin thickness, Glucose, Body mass
index(BMI), Diabetes pedigree Function, and
Age[42].
accuracy of algorithm in
performance of algorithms
100 90.36
percentage %
88.8 88.54 85.68

90 83.72
80
70
60
50
40
30
20
10
0
SVm Bayes Net DecisionStumb AdaboostMl Propoesed method
ALGORITHM
Fig2: Accuracy of Algorithm
4. Discussion Cross Validation

As we have seen the result based on this result, It is used for the evaluation method and checking
different classifier provided different result. We for the training set and the testing set in the
perceive that the result of combined or ensemble prediction. Different cross validation evaluation
classifier provide better accuracy the individual or method are used in different research like split
single algorithm or classifier algorithm .so percentage, 5 k cross validation, 3 k cross
improving the prediction accuracy of diabetes by validation etc.in this study 10 k cross validation
applying such like method helps for early save of used for evaluation method in the diabetes
human life .in this study we made the comparison prediction.
between different classifier algorithm including
the proposed ensemble method .in this proposed 5. CONCLUSION
system three different machine learning algorithm
There are Various data mining method and its
which are mostly used in diabetes diseases
application were studied or reviewed .application
prediction .each individual or single algorithm
of machine learning algorithm were applied in
provided less accuracy relative to the combined different medical data sets including machine
one.as we have seen on table 1 the accuracy of the Diabetes dataset. Machine learning methods have
proposed system is very high performance than different power in different data set. We obtained
the other machine learning algorithm. The error 768 record diabetes data set from UCI.the
rate or incorrectly classified in this proposed comparison of individual algorithm and the
algorithm is low than the other. proposed method is done on this study. We
The proposed system provided better accuracy applying 10 cross validation used for evaluation of
than individual one .10 cross validation used for the performance of these machine learning
the evaluation method .Ensemble approach using classification methods purpose. In this study the
AdaBoostM1. proposed method provide high accuracy with
accuracy value of 90.36% and decisionStump
provided less accuracy than other by providing
83.72% accuracy. Therefore, using ensemble
method used to provide better prediction
performance or accuracy than single one.
Future Work [7] Rani, A. S., & Jyothi, S. (2016, March).
In this study only diabetes diseases data set used for Performance analysis of classification
future it can be extend in different dataset .it is also algorithms under different datasets.
it is possible to use another algorithm as a base In Computing for Sustainable Global
learner such as Artificial neural network, (ANN), Development (INDIACom), 2016 3rd
decision tree, naïve Bayes. The proposed system International Conference on (pp. 1584-1589).
use small sample data with only eight attributes it IEEE.
is possible to add another factor for diabetes
diseases. [8] Pradeep, K. R., & Naveen, N. C. (2016,
REFERENCE December). Predictive analysis of diabetes
using J48 algorithm of classification
[1] Kavakiotis, I., Tsave, O., Salifoglou, A., techniques. In Contemporary Computing and
Maglaveras, N., Vlahavas, I., & Chouvarda, I. Informatics (IC3I), 2016 2nd International
(2017). Machine learning and data mining Conference on (pp. 347-352). IEEE.
methods in diabetes research. Computational
and structural biotechnology journal. [9] Perveen, S., Shahbaz, M., Guergachi, A., &
Keshavjee, K. (2016). Performance analysis of
[2] Zheng, T., Xie, W., Xu, L., He, X., Zhang, Y., data mining classification techniques to
You, M., ... & Chen, Y. (2017). A machine predict diabetes. Procedia Computer
learning-based framework to identify type 2 Science, 82, 115-121.
diabetes through electronic health
records. International journal of medical [10] Santhanam, T., & Padmavathi, M. S. (2015).
informatics, 97, 120-127. Application of K-means and genetic
algorithms for dimension reduction by
[3] Xu, W., Zhang, J., Zhang, Q., & Wei, X. integrating SVM for diabetes
(2017, February). Risk prediction of type II diagnosis. Procedia Computer Science, 47,
diabetes based on random forest model. 76-83.
In Advances in Electrical, Electronics,
Information, Communication and Bio- [11] Kandhasamy, J. P., & Balamurali, S. (2015).
Informatics (AEEICB), 2017 Third Performance analysis of classifier models to
International Conference on (pp. 382-386). predict diabetes mellitus. Procedia Computer
IEEE. Science, 47, 45-51.
[4] Song, Y., Liang, J., Lu, J., & Zhao, X. (2017). [12] Bashir, S., Qamar, U., Khan, F. H., & Javed,
An efficient instance selection algorithm for k M. Y. (2014, December). An Efficient Rule-
nearest neighbour Based Classification of Diabetes Using ID3,
regression. Neurocomputing, 251, 26-34. C4. 5, & CART Ensembles. In Frontiers of
Information Technology (FIT), 2014 12th
[5] Komi, M., Li, J., Zhai, Y., & Zhang, X. (2017, International Conference on (pp. 226-231).
June). Application of data mining methods in IEEE.
diabetes prediction. In Image, Vision and
Computing (ICIVC), 2017 2nd International [13] Meng, X. H., Huang, Y. X., Rao, D. P.,
Conference on (pp. 1006-1010). IEEE. Zhang, Q., & Liu, Q. (2013). Comparison of
three data mining models for predicting
[6] Meza-Palacios, R., Aguilar-Lasserre, A. A., diabetes or prediabetes by risk factors. The
Ureña-Bogarín, E. L., Vázquez-Rodríguez, C. Kaohsiung journal of medical sciences, 29(2),
F., Posada-Gómez, R., & Trujillo-Mata, A. 93-99.
(2017). Development of a fuzzy expert system
for the nephropathy control assessment in [14] Krati Saxena, D., Khan, Z., & Singh, S.(2014)
patients with type 2 diabetes mellitus. Expert Diagnosis of Diabetes Mellitus using K
Systems with Applications, 72, 335-343. Nearest Neighbor Algorithm.
[15] Saravananathan, K., & Velmurugan, T. tool. International journal of advanced
(2016). Analyzing Diabetic Data using research in computer and communication
Classification Algorithms in Data engineering, 4, 196-1.
Mining. Indian Journal of Science and
[24] Lee, B. J., Ku, B., Nam, J., Pham, D. D., &
Technology, 9(43).
Kim, J. Y. (2014). Prediction of fasting plasma
[16] Guo, Y., Bai, G., & Hu, Y. (2012, December). glucose status using anthropometric measures
Using bayes network for prediction of type-2 for diagnosing type 2 diabetes. IEEE journal
diabetes. In Internet Technology And Secured of biomedical and health informatics, 18(2),
Transactions, 2012 International Conference 555-561.
for (pp. 471-472). IEEE.
[25] Pavate, A., & Ansari, N. (2015, September).
[17] Al Jarullah, A. A. (2011, April). Decision tree Risk Prediction of Disease Complications in
discovery for the diagnosis of type II diabetes. Type 2 Diabetes Patients Using Soft
In Innovations in Information Technology Computing Techniques. In Advances in
(IIT), 2011 International Conference on (pp. Computing and Communications (ICACC),
303-307). IEEE. 2015 Fifth International Conference on (pp.
371-375). IEEE.
[18] Negi, A., & Jaiswal, V. (2016, December). A
first attempt to develop a diabetes prediction [26] Mekruksavanich, S. (2016, August). Medical
method based on different global datasets. expert system based ontology for diabetes
In Parallel, Distributed and Grid Computing disease diagnosis. In Software Engineering
(PDGC), 2016 Fourth International and Service Science (ICSESS), 2016 7th IEEE
Conference on (pp. 237-241). IEEE. International Conference on (pp. 383-389).
IEEE.
[19] Thirumal, P. C., & Nagarajan, N. (2015).
Utilization of data mining techniques for [27] Hashi, E. K., Zaman, M. S. U., & Hasan, M.
diagnosis of diabetes mellitus-a case R. (2017, February). An expert clinical
study. ARPN Journal of Engineering and decision support system to predict disease
Applied Science, 10(1). using classification techniques. In Electrical,
Computer and Communication Engineering
[20] Anand, A., & Shakti, D. (2015, September).
(ECCE), International Conference on(pp. 396-
Prediction of diabetes based on personal
400). IEEE.
lifestyle indicators. In Next Generation
Computing Technologies (NGCT), 2015 1st [28] Kang, S., Kang, P., Ko, T., Cho, S., Rhee, S.
International Conference on (pp. 673-676). J., & Yu, K. S. (2015). An efficient and
IEEE. effective ensemble of support vector machines
for anti-diabetic drug failure prediction. Expert
[21] Vijayan, V. V., & Anjali, C. (2015,
Systems with Applications, 42(9), 4265-4273.
December). Prediction and diagnosis of
diabetes mellitus—A machine learning [29] Ramzan, M. (2016, August). Comparing and
approach. In Intelligent Computational evaluating the performance of WEKA
Systems (RAICS), 2015 IEEE Recent Advances classifiers on critical diseases. In Information
in (pp. 122-127). IEEE. Processing (IICIP), 2016 1st India
International Conference on (pp. 1-4). IEEE.
[22] Kumari, V. A., & Chitra, R. (2013).
Classification of diabetes disease using [30] Varma, K. V., Rao, A. A., Lakshmi, T. S. M.,
support vector machine. International Journal & Rao, P. N. (2014). A computational
of Engineering Research and intelligence approach for a better diagnosis of
Applications, 3(2), 1797-1801. diabetic patients. Computers & Electrical
Engineering, 40(5), 1758-1765.
[23] Prajwala, T. R. (2015). A comparative study
on decision tree and random forest using R
[31] Aljumah, A. A., Ahamad, M. G., & Siddiqui, [41] University of Wakito, Downloading and
M. K. (2013). Application of data mining: installing Weka. Available from:
Diabetes health care in young and old http://www.cs.waikato.ac.nz/ml/weka/
patients. Journal of King Saud University- Wikipedia
Computer and Information Sciences, 25(2),
[42] https://www.kaggle.com/uciml/pima-indians-
127-136.
diabetes-database
[32] Patil, B. M., Joshi, R. C., & Toshniwal, D.
(2010). Hybrid prediction model for type-2
diabetic patients. Expert systems with
applications, 37(12), 8102-8108.//19
[33] Mounika, M., Suganya, S. D., Vijayashanthi,
B., & Anand, S. K. (2015). Predictive analysis
of diabetic treatment using classification
algorithm. IJCSIT, 6, 2502-2505.
[34] Nai-arun, N., & Moungmai, R. (2015).
Comparison of classifiers for the risk of
diabetes prediction. Procedia Computer
Science, 69, 132-142
[35] Bashir, S., Qamar, U., Khan, F. H., & Naseem,
L. (2016). HMV: a medical decision support
framework using multi-layer classifiers for
disease prediction. Journal of Computational
Science, 13, 10-25.
[36] kumar Dewangan, A., & Agrawal, P. (2015).
Classification of Diabetes Mellitus Using
Machine Learning Techniques. International
Journal of Engineering and Applied
Sciences, 2(5), 145-148.
[37] Lukmanto, R. B., & Irwansyah, E. (2015). The
Early Detection of Diabetes Mellitus (DM)
Using Fuzzy Hierarchical Model. Procedia
Computer Science, 59, 312-319.
[38] Meng, X. H., Huang, Y. X., Rao, D. P.,
Zhang, Q., & Liu, Q. (2013). Comparison of
three data mining models for predicting
diabetes or prediabetes by risk factors. The
Kaohsiung journal of medical sciences, 29(2),
93-99.
[39] Joshi, R., & Alehegn, M. (2017). Analysis and
prediction of diabetes diseases using machine
learning algorithm: Ensemble approach.
[40] http://www.who.int/mediacentre/factsheets/fs3
12/en/

Analysis and Prediction of Diabetes Mell

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analysis and Prediction of Diabetes Mell

Uploaded by

Copyright:

Available Formats

Analysis and prediction of diabetes Mellitus using machine learning

Machine learning Algorithm Table 1: The predictive accuracy (in percentage

Validation Classification Accuracy Incorrectly

Fig 1: Proposed Work Flow

88.8 88.54 85.68

Fig2: Accuracy of Algorithm

4. Discussion Cross Validation

You might also like