Professional Documents
Culture Documents
Abstract— In recent years, diabetes mellitus has increased In recent years, humanity has been immersed in health
its prevalence in the global landscape and currently, due to problems, especially in low-income environments, and the
COVID-19, people with diabetes mellitus are the most likely to situation is aggravated by the limited capacity of the health
develop a critical picture of this disease, which is why early system [4] This is why it is important to develop and
diagnosis is so important and where the implementation of
implement technologies such as machine learning models
machine learning has played a vital role in recent years. In this
study we conducted a systematic review of 55 researches where that serve as tools for doctors and patients, through preventive
models focused on the prediction of diabetes mellitus and its medicine that can help diagnose patients early and provide
different types have been developed or implemented, these them with health advice.
articles have been retrieved from important databases such as
IEEE Xplore, Scopus, ScienceDirect, IOPscience, EBSCOhost, The aim of this article is to analyze and to make known the
Wiley. The results obtained show that one of the models based presence of machine learning models to detect and predict
on Support Vector Machine algorithms achieved 100% diabetes mellitus and its types. This article aims to provide an
accuracy in disease prediction. The vast majority of researches analytical summary based on research conducted in different
used Wekka platform as a modeling tool, but it is worth
countries around the world in the last 4 years.
mentioning that the models with the best performance were
developed in MATLAB (100%) and RStudio (99%). On the
other hand, researches seek to predict diabetes without
II. METODOLOGY
specifying the type, however, there are a considerable number
of articles that predict type 2 diabetes. A. Type of study
Keywords—Diabetes Mellitus, Diabetes Types, Diabetes For the preparation of the article, the systematic review of
Gestational, Machine Learning, Systematic review the scientific literature will be used; this is a process that
allows the collection of relevant evidence on a given topic, in
I. INTRODUCTION addition, it adjusts to the established eligibility criteria, which
allows obtaining answers to the research questions
formulated. [5]
Over the years, diabetes has become a global public health
problem. Recent studies show that more than 381 million
people over the age of 18 suffer from diabetes and that B. Research questions
approximately 45.8% of them have not yet been diagnosed.
The proposed research questions are as follows:
[1].
This disease is classified into 3 types. Type 1 diabetes which RQ1 Which diabetes mellitus prediction models have shown
is caused by insulin deficiency. Type 2 diabetes is caused by the best results according to performance metrics over the
varying degrees of insulin resistance, altered insulin past 4 years?
secretion, increased glucose production and various genetic
metabolism defects in the action of insulin. Finally, RQ2: Which tools and languages are the most widely used in
gestational diabetes occurs in women during pregnancy. [2] the world to develop or implement machine learning models
for diabetes mellitus prediction?
Now, most physicians would agree that this disease, largely
related to one's lifestyle, can be prevented, unfortunately, the RQ3: Which countries has the most research related to the
medical community has been largely absent from the battle to prediction of diabetes mellitus been conducted in the last 4
improve these conditions. In fact, numerous studies show that years?
physicians often discuss weight management, physical
activity, or proper nutrition in <40% of the people they see in RQ4: What type of diabetes mellitus has had the highest
their offices. [3] amount of scientific research focused on prediction with
machine learning worldwide?
IEEE
SCREENING
SCOPUS
Did not meet the
331 12 inclusion criteria
Duplicate Removal
ScienceDirect TOTAL
55
Papers
ELIGIBILITY
74 2
relevant
IOPscience Articles assessed for Articles excluded with
eligibility (n=55) reasons (n= 0)
18 2
Did not address the
EBSCOhost research question
20 2
INCLUDED
D. Inclusion and Exclusion criteria Figure 2. Document Inclusion and Exclusion Flowchart
Inclusion and exclusion criteria presented in the following
table were applied for the systematic review study.
Criteria
I01 Articles related to the development or
performance comparison of diabetes mellitus
prediction models.
The following graph represents the percentage
contribution of each of the databases in this review.
TABLE III. ARTICLES BY COUNTRY
Israel 1 [48]
9
Italy 1 [49]
8
North
7 1 [50]
Macedonia
6
Morocco 1 [51]
5
Pakistan 1 [52]
4
Poland 1 [53]
3
USA 5 [54],[55],[56],[57],[58]
2
1 Total 55
0
2018 2019 2020 2021 The following graph represents the number of articles
EBSCOhost IEEE Xplore published by continent, where Asia is predominant.
IOPScience ScienceDirect
SCOPUS WILEY
Africa 1
Figure 4. Articles by year and database
Asia 45
0 10 20 30 40 50
The following graph represents the number of articles by RQ2: Which tools and languages are the most widely used
programming language used in the research, where the in the world to develop or implement machine learning
language. models for diabetes mellitus prediction?
Other systemic review studies conducted such as [59] focus The tools and languages with which machine learning models
on analyzing diabetes predictions based on machine learning are usually developed or implemented worldwide are the free
and deep learning techniques published in the last six years software platform Weka, for its ease of use, as well as
(2013 to 2019), where the main datasets were identified as: MATLAB, in which the M language is used, followed by R
Electrocardiograms, Breath Dataset, ICU Datasets, PIMA Studio with the R language, these last two tools managed to
Indian Diabetes, with the latter being predominant. Likewise, develop models that obtained the highest scores according to
the article determined that the most used classifiers for Accuracy.
diabetes prediction are: Artificial Neural Network (ANN),
Support Vector Machine (SVM), Decision Tree, and Naive The vast majority of research is conducted in India followed
Bayes. by China. This result indicates that it is in these countries
where there is greater experience and preference for the
In research 60, 31 articles were selected to identify the development of predictive models for diabetes.
applications of artificial intelligence (AI) for the care of type
2 diabetes mellitus, and the main applications of AI for the Likewise, there is a greater number of studies that do not
care of type 2 diabetes mellitus were screening and diagnosis. focus on a specific type of diabetes mellitus, but it is also
Among all the AI methods reviewed, machine learning observed that there is an important number of studies that
methods were the most applied techniques and the most used seek to predict type 2 diabetes mellitus.
methods were: Support Vector Machine and Naive Bayes. In
the same way, the most important variables used in the
Based on the results obtained, it is recommended for future diabetes based on machine learning algorithm,” Int.
research within the scope of this review, to work on the J. Environ. Res. Public Health, vol. 18, no. 6, pp. 9–
development of predictive models with the Support Vector 11, 2021, doi: 10.3390/ijerph18063317.
Machine algorithm due to the good results obtained, as well [13] Y. Srivastava, P. Khanna, and S. Kumar, “Estimation
as to consider accuracy as a metric to evaluate model of Gestational Diabetes Mellitus using Azure AI
performance and to use MATLAB or R Studio as Services,” Proc. - 2019 Amity Int. Conf. Artif. Intell.
development tools. AICAI 2019, pp. 321–326, 2019, doi:
10.1109/AICAI.2019.8701307.
VI. REFERENCES [14] M. Tanvir Islam, M. Raihan, F. Farzana, P. Ghosh,
and S. Ahmed Shaj, “An empirical study on diabetes
[1] N. Nnamoko, A. Hussain, and D. England, mellitus prediction using apriori algorithm,” Adv.
“Predicting Diabetes Onset: An Ensemble Intell. Syst. Comput., vol. 1166, pp. 539–550, 2021,
Supervised Learning Approach,” 2018 IEEE Congr. doi: 10.1007/978-981-15-5148-2_48.
Evol. Comput. CEC 2018 - Proc., pp. 1–7, 2018, doi: [15] M. T. Islam, M. Raihan, F. Farzana, N. Aktar, P.
10.1109/CEC.2018.8477663. Ghosh, and S. Kabiraj, “Typical and Non-Typical
[2] I. Gnanadass, “Prediction of Gestational Diabetes by Diabetes Disease Prediction using Random Forest
Machine Learning Algorithms,” IEEE Potentials, Algorithm,” 2020 11th Int. Conf. Comput. Commun.
vol. 39, no. 6, pp. 32–37, 2020, doi: Netw. Technol. ICCCNT 2020, pp. 1–6, 2020, doi:
10.1109/MPOT.2020.3015190. 10.1109/ICCCNT49239.2020.9225430.
[3] J. M. Rippe, “The Silent Epidemic,” Am. J. Med., vol. [16] A. Mir and S. N. Dhage, “Diabetes Disease
134, no. 2, pp. 164–165, 2021, doi: Prediction Using Machine Learning on Big Data of
10.1016/j.amjmed.2020.09.028. Healthcare,” Proc. - 2018 4th Int. Conf. Comput.
[4] WHO, “Recommendations for people living with Commun. Control Autom. ICCUBEA 2018, pp. 1–6,
NCDs, caregivers, family members and the public,” 2018, doi: 10.1109/ICCUBEA.2018.8697439.
World Heal. Organ., no. April, pp. 1–6, 2020, [17] K. L. Priya, M. S. Charan Reddy Kypa, M. M.
[Online]. Available: Sudhan Reddy, and G. R. Mohan Reddy, “A Novel
https://apps.who.int/iris/handle/10665/331473. Approach to Predict Diabetes by Using Naive Bayes
[5] W. Mengist, T. Soromessa, and G. Legese, “Method Classifier,” Proc. 4th Int. Conf. Trends Electron.
for conducting systematic literature review and meta- Informatics, ICOEI 2020, no. Icoei, pp. 603–607,
analysis for environmental science research,” 2020, doi: 10.1109/ICOEI48184.2020.9142959.
MethodsX, vol. 7, p. 100777, 2020, doi: [18] R. S. Raj, D. S. Sanjay, M. Kusuma, and S. Sampath,
10.1016/j.mex.2019.100777. “Comparison of Support Vector Machine and Naïve
[6] S. P. Chatrati et al., “Smart home health monitoring Bayes Classifiers for Predicting Diabetes,” 1st Int.
system for predicting type 2 diabetes and Conf. Adv. Technol. Intell. Control. Environ.
hypertension,” J. King Saud Univ. - Comput. Inf. Sci., Comput. Commun. Eng. ICATIECE 2019, pp. 41–45,
no. xxxx, Jan. 2020, doi: 2019, doi: 10.1109/ICATIECE45860.2019.9063792.
10.1016/j.jksuci.2020.01.010. [19] R. Syed, R. K. Gupta, and N. Pathik, “An Advance
[7] S. K. Dey, A. Hossain, and M. M. Rahman, Tree Adaptive Data Classification for the Diabetes
“Implementation of a Web Application to Predict Disease Prediction,” 2018 Int. Conf. Recent Innov.
Diabetes Disease: An Approach Using Machine Electr. Electron. Commun. Eng. ICRIEECE 2018,
Learning Algorithm,” 2018 21st Int. Conf. Comput. pp. 1793–1798, 2018, doi:
Inf. Technol. ICCIT 2018, pp. 1–5, 2019, doi: 10.1109/ICRIEECE44171.2018.9009180.
10.1109/ICCITECHN.2018.8631968. [20] G. Tripathi and R. Kumar, “Early Prediction of
[8] J. Xue, F. Min, and F. Ma, “Research on diabetes Diabetes Mellitus Using Machine Learning,”
prediction method based on machine learning,” J. ICRITO 2020 - IEEE 8th Int. Conf. Reliab. Infocom
Phys. Conf. Ser., vol. 1684, no. 1, 2020, doi: Technol. Optim. (Trends Futur. Dir., pp. 1009–1014,
10.1088/1742-6596/1684/1/012062. 2020, doi: 10.1109/ICRITO48877.2020.9197832.
[9] H. Liu et al., “Machine learning risk score for [21] D. Vigneswari, N. K. Kumar, V. Ganesh Raj, A.
prediction of gestational diabetes in early pregnancy Gugan, and S. R. Vikash, “Machine Learning Tree
in Tianjin, China,” Diabetes. Metab. Res. Rev., no. Classifiers in Predicting Diabetes Mellitus,” 2019 5th
February, 2020, doi: 10.1002/dmrr.3397. Int. Conf. Adv. Comput. Commun. Syst. ICACCS
[10] G. Li, Y. Liu, H. Li, R. Yao, and C. Li, “MCMC 2019, pp. 84–87, 2019, doi:
impute missing values and Bayesian variable 10.1109/ICACCS.2019.8728388.
selection for logistic regression model to predict [22] S. C. Gupta and N. Goel, “Performance enhancement
Pima Indian Diabetes,” J. Phys. Conf. Ser., vol. 1865, of diabetes prediction by finding optimum K for
no. 4, p. 042087, Apr. 2021, doi: 10.1088/1742- KNN classifier with feature selection method,” Proc.
6596/1865/4/042087. 3rd Int. Conf. Smart Syst. Inven. Technol. ICSSIT
[11] C. Zhu, C. U. Idemudia, and W. Feng, “Improved 2020, no. Icssit, pp. 980–986, 2020, doi:
logistic regression model for diabetes prediction by 10.1109/ICSSIT48917.2020.9214129.
integrating PCA and K-means techniques,” [23] P. S. Kohli and A. L. Regression, “Application of
Informatics Med. Unlocked, vol. 17, no. April, p. Machine Learning in Disease Prediction,” 2020 IEEE
100179, 2019, doi: 10.1016/j.imu.2019.100179. 5th Int. Conf. Comput. Commun. Autom. ICCCA
[12] H. M. Deberneh and I. Kim, “Prediction of type 2 2020, pp. 1–4, 2020.
[24] P. Kaur, N. Sharma, A. Singh, and B. Gill, “CI-DPF: datamining algorithms,” 2020 Int. Conf. Comput.
A Cloud IoT based Framework for Diabetes Commun. Informatics, ICCCI 2020, pp. 22–25, 2020,
Prediction,” 2018 IEEE 9th Annu. Inf. Technol. doi: 10.1109/ICCCI48352.2020.9104108.
Electron. Mob. Commun. Conf. IEMCON 2018, pp. [37] S. C. Gupta and N. Goel, “Enhancement of
654–660, 2019, doi: Performance of K-Nearest Neighbors Classifiers for
10.1109/IEMCON.2018.8614775. the Prediction of Diabetes Using Feature Selection
[25] Karthikeyan S. M, C. P.J, G. C. B, and M. J, Method,” 2020 IEEE 5th Int. Conf. Comput.
“Performance Analysis Based on Data Mining Commun. Autom. ICCCA 2020, pp. 681–686, 2020,
Technique in Predicting the Diabetic Disease – doi: 10.1109/ICCCA49541.2020.9250887.
Decision tree and Naïve Bayes,” 2019 1st Int. Conf. [38] V. L. Helen Josephine, A. P. Nirmala, and V. L.
Adv. Inf. Technol., pp. 2019–2022, 2019. Alluri, “Impact of Hidden Dense Layers in
[26] S. Thenappan, M. Valan Rajkumar, and P. S. Convolutional Neural Network to enhance
Manoharan, “Predicting Diabetes Mellitus Using Performance of Classification Model,” IOP Conf.
Modified Support Vector Machine with Cloud Ser. Mater. Sci. Eng., vol. 1131, no. 1, p. 012007,
Security,” IETE J. Res., vol. 0, no. 0, pp. 1–11, 2020, Apr. 2021, doi: 10.1088/1757-899X/1131/1/012007.
doi: 10.1080/03772063.2020.1782781. [39] A. Mujumdar and V. Vaidehi, “Diabetes Prediction
[27] R. Patil and S. Tamane, “A comparative analysis on using Machine Learning Algorithms,” Procedia
the evaluation of classification algorithms in the Comput. Sci., vol. 165, pp. 292–299, 2019, doi:
prediction of diabetes,” Int. J. Electr. Comput. Eng., 10.1016/j.procs.2020.01.047.
vol. 8, no. 5, pp. 3966–3975, 2018, doi: [40] P. Samant and R. Agarwal, “Machine learning
10.11591/ijece.v8i5.pp3966-3975. techniques for medical diagnosis of diabetes using
[28] B. Suvarnamukhi and M. Seshashayee, “Big data iris images,” Comput. Methods Programs Biomed.,
processing system for diabetes prediction using vol. 157, pp. 121–128, 2018, doi:
machine learning technique,” Int. J. Innov. Technol. 10.1016/j.cmpb.2018.01.004.
Explor. Eng., vol. 8, no. 12, pp. 4478–4483, 2019, [41] D. Sisodia and D. S. Sisodia, “Prediction of Diabetes
doi: 10.35940/ijitee.L3515.1081219. using Classification Algorithms,” Procedia Comput.
[29] R. G. Franklin and B. Muthukumar, “Detection of Sci., vol. 132, no. Iccids, pp. 1578–1585, 2018, doi:
diabetes mellitus using machine learning 10.1016/j.procs.2018.05.122.
algorithms,” Int. J. Res. Pharm. Sci., vol. 11, no. 4, [42] D. Jashwanth Reddy et al., “Predictive machine
pp. 6881–6887, 2020, doi: learning model for early detection and analysis of
10.26452/ijrps.v11i4.3662. diabetes,” Mater. Today Proc., no. xxxx, 2020, doi:
[30] P. S. Kumar and S. Pranavi, “Performance analysis 10.1016/j.matpr.2020.09.522.
of machine learning algorithms on diabetes dataset [43] N. P. Tigga and S. Garg, “Prediction of Type 2
using big data analytics,” 2017 Int. Conf. Infocom Diabetes using Machine Learning Classification
Technol. Unmanned Syst. Trends Futur. Dir. ICTUS Methods,” Procedia Comput. Sci., vol. 167, no. 2019,
2017, vol. 2018-Janua, no. Iddm, pp. 508–513, 2018, pp. 706–716, 2020, doi:
doi: 10.1109/ICTUS.2017.8286062. 10.1016/j.procs.2020.03.336.
[31] P. Pandeeswary and M. Janaki, “Performance [44] B. Jain, N. Ranawat, P. Chittora, P. Chakrabarti, and
analysis of big data classification techniques on S. Poddar, “A machine learning perspective: To
diabetes prediction,” Int. J. Innov. Technol. Explor. analyze diabetes,” Mater. Today Proc., no. xxxx,
Eng., vol. 8, no. 10, pp. 533–537, 2019, doi: 2021, doi: 10.1016/j.matpr.2020.12.445.
10.35940/ijitee.J8840.0881019. [45] M. Radja and A. W. R. Emanuel, “Performance
[32] J. Beschi Raja, R. Anitha, R. Sujatha, V. Roopa, and Evaluation of Supervised Machine Learning
S. Sam Peter, “Diabetics prediction using gradient Algorithms Using Different Data Set Sizes for
boosted classifier,” Int. J. Eng. Adv. Technol., vol. 9, Diabetes Prediction,” Proceeding - 2019 5th Int.
no. 1, pp. 3181–3183, 2019, doi: Conf. Sci. Inf. Technol. Embrac. Ind. 4.0 Towar.
10.35940/ijeat.A9898.109119. Innov. Cyber Phys. Syst. ICSITech 2019, pp. 252–
[33] P. A. Ebenzer, R. Bhattalwar, H. Patel, and R. 258, 2019, doi:
Kumar, “Patient readmission prediction due to 10.1109/ICSITech46713.2019.8987479.
diabetes using machine learning classification,” Int. [46] R. Aminah and A. H. Saputro, “Diabetes prediction
J. Innov. Technol. Explor. Eng., vol. 9, no. 1, pp. system based on iridology using machine learning,”
678–681, 2019, doi: 10.35940/ijitee.A4561.119119. 2019 6th Int. Conf. Inf. Technol. Comput. Electr.
[34] S. Raghavendra and J. Santosh Kumar, “Performance Eng. ICITACEE 2019, pp. 1–6, 2019, doi:
evaluation of random forest with feature selection 10.1109/ICITACEE.2019.8904125.
methods in prediction of diabetes,” Int. J. Electr. [47] R. B. Lukmanto, Suharjito, A. Nugroho, and H.
Comput. Eng., vol. 10, no. 1, pp. 353–359, 2020, doi: Akbar, “Early detection of diabetes mellitus using
10.11591/ijece.v10i1.pp353-359. feature selection and fuzzy support vector machine,”
[35] M. T. Student, K. Lakshmaih, E. Foundation, and G. Procedia Comput. Sci., vol. 157, pp. 46–54, 2019,
District, “Diabetic Prediction Using Kernel Based doi: 10.1016/j.procs.2019.08.140.
Support Vector Machine,” vol. 9, no. 2, pp. 1178– [48] A. Cahn et al., “Prediction of progression from pre-
1183, 2020. diabetes to diabetes: Development and validation of
[36] M. S. Geetha Devasena, R. Kingsy Grace, and G. a machine learning model,” Diabetes. Metab. Res.
Gopu, “PDD: Predictive diabetes diagnosis using Rev., vol. 36, no. 2, pp. 1–8, 2020, doi:
10.1002/dmrr.3252. Technol. HI-POCT 2019, pp. 147–150, 2019, doi:
[49] E. Cordelli, G. Maulucci, M. De Spirito, A. Rizzi, D. 10.1109/HI-POCT45284.2019.8962811.
Pitocco, and P. Soda, “A decision support system for [56] J. Ma, “Machine Learning in Predicting Diabetes in
type 1 diabetes mellitus diagnostics based on dual the Early Stage,” Proc. - 2020 2nd Int. Conf. Mach.
channel analysis of red blood cell membrane Learn. Big Data Bus. Intell. MLBDBI 2020, pp. 167–
fluidity,” Comput. Methods Programs Biomed., vol. 172, 2020, doi:
162, pp. 263–271, 2018, doi: 10.1109/MLBDBI51377.2020.00037.
10.1016/j.cmpb.2018.05.025. [57] L. Kopitar, P. Kocbek, L. Cilar, A. Sheikh, and G.
[50] L. Loku, B. Fetaji, and M. Fetaji, “Prevention of Stiglic, “Early detection of type 2 diabetes mellitus
Diabetes by Devising A Prediction Analytics using machine learning-based prediction models,”
Model,” HORA 2020 - 2nd Int. Congr. Human- Sci. Rep., vol. 10, no. 1, pp. 1–12, 2020, doi:
Computer Interact. Optim. Robot. Appl. Proc., pp. 1– 10.1038/s41598-020-68771-z.
4, 2020, doi: 10.1109/HORA49412.2020.9152894. [58] J. J. Khanam and S. Y. Foo, “A comparison of
[51] T. Nibareke and J. Laassiri, “Using Big Data- machine learning algorithms for diabetes prediction,”
machine learning models for diabetes prediction and ICT Express, no. xxxx, 2021, doi:
flight delays analytics,” J. Big Data, vol. 7, no. 1, 10.1016/j.icte.2021.02.004.
2020, doi: 10.1186/s40537-020-00355-0. [59] S. Larabi-Marie-Sainte, L. Aburahmah, R.
[52] T. Mahboob Alam et al., “A model for early Almohaini, and T. Saba, “Current techniques for
prediction of diabetes,” Informatics Med. Unlocked, diabetes prediction: Review and case study,” Appl.
vol. 16, no. July, p. 100204, 2019, doi: Sci., vol. 9, no. 21, 2019, doi: 10.3390/app9214604.
10.1016/j.imu.2019.100204. [60] S. Abhari, S. R. N. Kalhori, M. Ebrahimi, H.
[53] A. Viloria, Y. Herazo-Beltran, D. Cabrera, and O. B. Hasannejadasl, and A. Garavand, “Artificial
Pineda, “Diabetes Diagnostic Prediction Using intelligence applications in type 2 diabetes mellitus
Vector Support Machines,” Procedia Comput. Sci., care: Focus on machine learning methods,” Healthc.
vol. 170, pp. 376–381, 2020, doi: Inform. Res., vol. 25, no. 4, pp. 248–261, 2019, doi:
10.1016/j.procs.2020.03.065. 10.4258/hir.2019.25.4.248.
[54] R. Lee and C. Chitnis, “Improving health-care [61] K. De Silva, W. K. Lee, A. Forbes, R. T. Demmer, C.
systems by disease prediction,” Proc. - 2018 Int. Barton, and J. Enticott, “Use and performance of
Conf. Comput. Sci. Comput. Intell. CSCI 2018, pp. machine learning models for type 2 diabetes
726–731, 2018, doi: prediction in community settings: A systematic
10.1109/CSCI46756.2018.00145. review and meta-analysis,” Int. J. Med. Inform., vol.
[55] R. Deo and S. Panigrahi, “Performance Assessment 143, no. August, p. 104268, 2020, doi:
of Machine Learning Based Models for Diabetes 10.1016/j.ijmedinf.2020.104268.
Prediction,” 2019 IEEE Healthc. Innov. Point Care