Professional Documents
Culture Documents
Xavier’S College
Affiliated to Tribhuvan University
Maitighar, Kathmandu
Submitted by
Anom Maharjan (T.U. Exam Roll No. 10173/073)
Anuska Sthapit (T.U. Exam Roll No. 10174/073)
Submitted to
St. Xavier’S College
Department of Computer Science
Maitighar, Kathmandu, Nepal
April, 2021
Final Year Project Report
On
A Comparative Analysis and Risk Prediction of Diabetes using ML
Classification Algorithms
A final year project report submitted in partial fulfillment of the
requirements for the degree of Bachelor of Science in Computer Science
and Information Technology awarded by Tribhuvan University.
Submitted By:
Anom Maharjan (T.U. Exam Roll No. 10173/073)
Anuska Sthapit(T.U. Exam Roll No. 10174/073)
Submitted To:
ST. XAVIER’S COLLEGE
Department of Computer Science
Maitighar, Kathmandu, Nepal
April, 2021
CERTIFICATE OF APPROVAL
The undersigned certify that they have read and recommended to the Department of
Computer
Science for acceptance, a project proposal entitled “A Comparative Analysis and Risk
Prediction of Diabetes using ML Classification Algorithms.” submitted by Anom
Maharjan (TU ROLL NO.: 10173/073) and Anuska Sthapit (TU ROLL NO.:10174/073)
for the partial fulfillment of the requirement for the degree of Bachelor of Science in
Computer Science and Information Technology awarded by Tribhuvan University.
…………………………..
Er. Rajan Karmacharya
Supervisor /Lecturer
St. Xavier’s College
…………………………..
External Examiner
Tribhuvan University
…………………………..
Ganesh Yogi
Head of the Department
Department of Computer Science
St. Xavier’s College
ACKNOWLEDGEMENT
We are momentously privileged to be the students of Computer Science here in St. Xavier’s
College with a department, utterly packed by expertise of the respective field, greatly
supportive to the learners. We would like to express our sincere gratitude to Er. Rajan
Karmacharya – our supervisor for creating a virtuous academic and sociable environment to
foster this project. Therefore, we would like to express our innermost thanks to him for
providing us with all the crucial advices, guidelines and resources for the accomplishment of
this project.
We are also grateful to the entire Computer Science Department of St. Xavier’s College for
housing us a seemly environment where we could work with this project. We were pleased to
be under the commands of the department to help us from all possible ways. We would also
take this opportunity to express our gratitude to Mr. Ganesh Yogi for his continuous
encouragement and support throughout the completion of this project.
We would also like to express our heartfelt gratitude to Mr. Jeetendra Manandhar, Mr. Bal
Krishna Subedi, Er. Anil Shah, Er. Saugat Sigdel, Er. Nitin Malla, Er. Sansar Dewan,
Er. Sanjay Kumar Yadav, Mr. Ganesh Dhami and Mr. Ramesh Shahi for their constant
support and guidance. Furthermore, we are also appreciative towards all our colleagues,
seniors and relatives who had directly or indirectly been a part of this case study.
Diabetes is a major metabolic disorder, which can affect entire body system adversely. Early
detection of diabetes is very important to maintain a healthy life. The tedious identifying
process results in visiting of a patient to a diagnostic center and consulting doctor. The
motive of this project is to design a model, which can prognosticate the likelihood of diabetes
in patients with maximum accuracy and develop a Web application to implement the
findings. Therefore, four machine learning classification algorithms namely Logistic
Regression, KNN, Random Forest Classifier, Support Vector Machine are used in this
project. Experiments are performed on Pima Indians Diabetes Database (PIDD), which is
sourced from UCI machine learning repository. The performances of all the four algorithms
are evaluated on various measures like Precision, Accuracy, F-Measure, and Recall. Results
obtained show Logistic Regression outperforms with the highest accuracy of 79.8%
comparatively than other algorithms. These results are verified using Receiver Operating
Characteristic (ROC) curves in a systematic manner. After finding the model with highest
accuracy, a web application is developed using Python and Flask. This web application will
help patients to detect the likelihood of diabetes, without having to go to diagnostic center
and consulting doctor for regular checkups.
Keywords: Diabetes, SVM, Naive Byes, Decision Tree, Accuracy, Machine Learning
TABLE OF CONTENTS
ACKNOWLEDGEMENT....................................................................................................................i
ABSTRACT.........................................................................................................................................ii
LIST OF FIGURES............................................................................................................................iv
LIST OF TABLES...............................................................................................................................v
LIST OF ABBREVIATIONS............................................................................................................vi
CHAPTER 1: INTRODUCTION.......................................................................................................1
1.1 Background............................................................................................................................1
1.2 Problem Statement:................................................................................................................1
1.3 Project Objectives and Scope:................................................................................................2
1.4 Significance of the Study:......................................................................................................2
1.5 Project Features.....................................................................................................................3
1.6 Requirement Analysis............................................................................................................3
1.6.1 Dataset...............................................................................................................................3
1.7 Feasibility Study....................................................................................................................4
1.8 System Requirement – Minimum Hardware and Software....................................................5
1.8.1 Platforms............................................................................................................................5
CHAPTER 2: LITERATURE REVIEW...........................................................................................6
2.1 Machine Learning..................................................................................................................6
2.2 Related Research...................................................................................................................7
2.3 Proposed Methodology........................................................................................................10
2.3.1 Dataset Description..........................................................................................................10
2.3.2 Dataset Preprocessing......................................................................................................11
2.3.3 Applying Machine Learning Models...............................................................................12
CHAPTER 3: SYSTEM DEVELOPMENT....................................................................................16
3.1 Project Management Strategy and Tools..............................................................................16
3.1.1 Project Workflow and schedule.......................................................................................17
3.1.2 Project Team....................................................................................................................17
3.1.3 Responsibilities................................................................................................................17
3.2 System Design.....................................................................................................................18
3.2.1 Data Preprocessing:.........................................................................................................18
3.2.2 Applying Machine Learning Models and Evaluating their Performance..........................25
3.2.3 Model comparison...........................................................................................................30
3.2.4 Feature Importance..........................................................................................................30
3.2.5 Save and Load Model......................................................................................................30
3.3 Web Application Architecture.............................................................................................31
3.4 Development Tools..............................................................................................................31
3.4.1 Visual Studio Code..........................................................................................................31
3.4.2 Python and Flask..............................................................................................................32
3.4.3 Heroku.............................................................................................................................32
3.4.4 Jupyter Notebook.............................................................................................................33
3.5 Project Schedule..................................................................................................................33
CHAPTER 4: RESULT ANALYSIS................................................................................................34
4.1 Experimental Results and Observations...............................................................................34
4.2 System Results.....................................................................................................................35
4.2.1 Screenshots and codes:....................................................................................................36
4.3 Critical Analysis..................................................................................................................39
4.4 Limitations and Future Enhancements.................................................................................40
4.4.1 Limitations.......................................................................................................................40
4.4.2 Future Enhancements.......................................................................................................41
4.5 Conclusion...........................................................................................................................41
REFERENCES..................................................................................................................................42
LIST OF FIGURES
ML Machine Learning
KNN K-Nearest Neighbors
SVM Support Vector Machine
LR Logistic Regression
RFC Random Forest Classifier
ANN Artificial Neural Network
DT Decision Tree
UCI University of California, Irvine
ROC Receiver Operating Characteristic
PIDD Pima Indians Diabetes Database
FPG Fasting Plasma Glucose
IGT Impaired Glucose Tolerance
OGTT Oral Glucose Tolerance Test
WHO World Health Organization
RAM Random Access Memory
GB GigaByte
AI Artificial Intelligence
NDA Nepal Diabetes Association
NCD Non Communicable Disease
GP Genetic Programming
FBS Fasting Blood Sugar
CHAPTER 1: INTRODUCTION
1.1 Background
Prediabetes was first recognized as an intermediate diagnosis and indication of a relatively
high risk for the future development of diabetes by the Expert Committee on Diagnosis and
Classification of Diabetes Mellitus in 1997, and it has been reported that approximately 5–
10% of patients with untreated prediabetes subsequently develop diabetes .The definition of
prediabetes includes a fasting plasma glucose (FPG) level in the range of 100–125 mg/dL
(5.6–6.9 mmol/L), impaired glucose tolerance (IGT) (oral glucose tolerance test (OGTT) 2
h measurement in the range of 140–199 mg/dL (7.8–11.0 mmol/L)), or HbA1c level in the
range of 5.7–6.4% (39–46 mmol/mol). Early diagnosis and intervention for prediabetes
could prevent these complications, prevent delay, or prevent the transition to diabetes and
be cost-effective. [ CITATION Ada14 \l 1033 ]
Machine learning is an area of artificial intelligence research, which uses statistical methods
for data classification. Several machine learning techniques have been applied in clinical
settings to predict disease and have shown higher accuracy for diagnosis than classical
methods. This project is a small example of how machine learning can be used in
prediction of prediabetes.
In this project, the authors aimed to develop and validate models to predict prediabetes
using Logistic Regression, KNN, Random Forest Classifier, and Support vector machines
(SVM), which could be effective as simple and accurate screening tools. The performances
of all the four algorithms are evaluated on various measures like Precision, Accuracy, F-
Measure, and Recall. Accuracy is measured over correctly and incorrectly classified
instances. The results are verified using Receiver Operating Characteristic (ROC) curves in
a proper and systematic manner.
Requirements analysis is the first stage in the systems engineering process and
software development process. The main purpose of Requirement Analysis is to describe the
functional and nonfunctional requirement of the project in the day-to-day life. This project
provides a web application developed and deployed using a valid model for patients to help
in predicting prediabetes. It reduces manual process of having to go to diagnostic centers
and doctors for follow-up checkups by providing automated and a reliable application which
uses predictive model to give results with high accuracy.
1.6.1 Dataset
Dataset of female patients with minimum twenty one year age of Pima Indian population has
been taken from UCI machine learning repository. [ CITATION USD \l 1033 ] This dataset is
originally owned by the National institute of diabetes and digestive and kidney diseases. In
this dataset there are total 768 instances classified into two classes: diabetic and non diabetic
with eight different risk factors: number of times pregnant, plasma glucose concentration of
two hours in an oral glucose tolerance test, diastolic blood pressure, triceps skin fold
thickness, two-hour serum insulin, body mass index, diabetes pedigree function and age as in
Table 1.
Feasibility study of the project is performed to get to know whether the project is convenient
in given circumstances:
As the application is convenient and very easy to use, it is quite flexible and user friendly in
the present environment.
Internal memory-500GB
Processors-Intel i5/i7
In Nepal, obesity is the cause of diabetes among 16.6 per cent female population and 13.6
per cent male as stated by World Health Organization. Likewise, dullness is identified to be
another cause among 3.3 per cent population. Doctors have pointed that Nepal is at high risk
of diabetes. According to the WHO, there is no exact data of patients with diabetes in Nepal.
But, the 2016 Diabetes Profile has shown that 9.1 percent Nepali population are living with
diabetes. It includes 10.5 percent men and 7.9 percent women. With the increasing
population and changing lifestyle, the burden of non-communicable diseases (NCDs) is very
high especially in the urban areas. The basic treatment services for diabetes is now available
in many places across the country but for major treatment, the patients have to come to the
cities. The flow of the patients is also high at the central hospital. The treatment of these
diseases require prolonged treatment and extra financial burden for families with low
economic condition. The government has prioritized NCDs in the National Health Policy
2015 as well as National Health Sector Strategy 2015-2020, which also includes diabetes.
[ CITATION Kri18 \l 1033 ]
Orabi et al. [ CITATION Sin18 \l 1033 ] designed a system for diabetes prediction, whose
main aim is the prediction of diabetes acandidate is suffering at a particular age. The
proposed system is designed based on the concept of machine learning,by applying decision
tree. Obtained results were satisfactory as the designed system works well in predicting
thediabetes incidents at a particular age, with higher accuracy using Decision tree.
Pradhan et al in [ CITATION Pra \l 1033 ] used Genetic programming (GP) for the training
and testing of the database for predictionof diabetes by employing Diabetes data set which is
sourced from UCI repository. Results achieved using GeneticProgramming gives optimal
accuracy as compared to other implemented techniques. There can be significantimprove in
accuracy by taking less time for classifier generation. It proves to be useful for diabetes
prediction at lowcost.
Rashid et al. in [ CITATION Abd15 \l 1033 ] designed a prediction model with two sub-
modules to predict diabetes-chronic disease. ANN(Artificial Neural Network) is used in the
first module and FBS (Fasting Blood Sugar) is used in the second module. Decision Tree
(DT) is used to detect the symptoms of diabetes on patient’s health.
Nongyao et al. in [ CITATION Dee \l 1033 ] applied an algorithm which classifies the risk of
diabetes mellitus. To fulfill the objective author has employed four following renowned
machine learning classification methods namely Decision Tree, ArtificialNeural Networks,
Logistic Regression and Naive Bayes. For improving the robustness of designed model
Bagging and Boosting techniques are used. Experimentation results shows the Random
Forest algorithm gives optimum results among all the algorithms employed.
Kandhasamy and Balamurali [ CITATION Jpr15 \l 1033 ]used multiple classifiers SVM, J48,
K-Nearest Neighbors (KNN), and Random Forest. The classification was performed on a
dataset taken from the UCI repository. The results of the classifiers were compared based on
the values of the accuracy, sensitivity, and specificity. The classification was done in two
cases, when the dataset is pre-processed and without preprocessing by using 5-fold cross
validation. The authors didn’t explain the pre-processing step applied on the dataset, they
just mentioned that the noise was removed from the data. They reported that the decision
tree J48 classifier has the highest accuracy rate being 73.82 % without pre-processing, while
the classifiers KNN (k = 1) and Random Forest showed the highest accuracy rate of 100%
after pre-processing the data.
Moreover, Yuvaraj and Sripreethaa [ CITATION NYu19 \l 1033 ] presented an application for
diabetes prediction using three different ML algorithms including Random Forest, Decision
Tree, and the Naïve Bayes. The Pima Indian Diabetes dataset (PID) was used after pre-
processing it. The authors didn’t mention how the data was pre-processed, however they
discussed the Information Gain method used for feature selection to extract the relevant
features. They used only eight main attributes among 13 (see Table A4). In addition, they
divided the dataset into 70% for training and 30% for testing. The results showed that the
random forest algorithm had the highest accuracy rate of 94%.
Furthermore, Tafa et al. [ CITATION ZTa15 \l 1033 ] proposed a new integrated improved
model of SVM and Naïve Bayes for predicting the diabetes. The model was evaluated using
a dataset collected from three different locations in Kosovo. The dataset contains eight
attributes and 402 patients where 80 patients had type 2 diabetes. Some attributes utilized in
this study (see Table A4) have not been investigated before, including the regular diet,
physical activity, and family history of diabetes. The authors didn’t mention whether the
data was pre-processed or not. For the validation test, they split the dataset into 50% for
each of the training and testing sets. The proposed combined algorithms have improved the
accuracy of the prediction to reach 97.6%. This value was compared with the performance
of SVM and Naïve Bayes achieving 95.52% and 94.52%, respectively.
Figure 2 Bar Graph representing ratio of Diabetic and Non Diabetic Patient
Figure 3 Pie Chart representing ratio of Diabetic and Non Diabetic Patient
Data Preprocessing is most important process. Mostly healthcare related data contains
missing vale and other impurities that can cause decrease in effectiveness of data. To improve
quality and effectiveness obtained after mining process, Data preprocessing is done. To use
Machine Learning Techniques on the dataset effectively this process is essential for accurate
result and successful prediction. For Pima Indian diabetes dataset we need to perform
preprocessing in two steps.
Missing Values removal- Remove all the instances that have zero (0) as worth.
Having zero, as worth is not possible. Therefore, this instance is eliminated. Through
eliminating irrelevant features/instances, we make feature subset and this process is
called features subset selection, which reduces dimensionality of data and help to
work faster.
Splitting of data- After cleaning the data, data is normalized in training and testing
the model. When data is spitted then we train algorithm on the training data set and
keep test data set aside. This training process will produce the training model based
on logic and algorithms and values of the feature in training data. Aim of
normalization is to bring all the attributes under same scale.
When data has been ready, we apply Machine Learning Technique.[ CITATION Mit \l 1033 ]
Different classification techniques, is used to predict diabetes. Main objective is to apply
Machine Learning Techniques to analyze the performance of these methods and find
accuracy of them, and figure out the responsible and important features, which play a major
role in prediction. The techniques used are as follows-
Algorithm-
o Select the hyperplane, which divides the class better.
o To find the better hyperplane you have to calculate the distance between the
planes and the data, which is called Margin.
o If the distance between the classes is low then the chance of miss conception is
high and vice versa. So we need to
o Select the class, which has the high margin.
o Margin = distance to positive point + Distance to negative point.
2.3.3.2 K-Nearest Neighbor
KNN is also a supervised machine-learning algorithm. KNN helps to solve both the
classification and regression problems. KNN is lazy prediction technique. KNN assumes that
similar things are near to each other. Many times data points, which are similar, are very near
to each other. KNN helps to group new work based on similarity measure. KNN algorithm
record all the records and classify them according to their similarity measure. For finding, the
distance between the points uses tree like structure. To make a prediction for a new data
point, the algorithm finds the closest data points in the training data set its nearest neighbors.
Here K= Number of nearby neighbors, it is always a positive integer. Neighbor’s value is
chosen from set of class. Closeness is mainly defined in terms of Euclidean distance. The
Euclidean distance between two points P and Q i.e. P (p1,p2, . Pn) and Q (q1, q2,..qn) is
defined by the following equation:-
n
d(P,Q) = ∑ ( Pi−Qi )2
i=1
Equation 1:
Algorithm-
o Take a sample dataset of columns and rows named as Pima Indian Diabetes
dataset.
o Take a test dataset of attributes and rows.
o Find the Euclidean distance by the help of formula-
Main aim of logistic regression is to best fit which is responsible for describing the
relationship between target and predictor variable. Logistic regression is a based on Linear
regression model. Logistic regression model uses sigmoid function to predict probability of
positive and negative class.
It is type of ensemble learning method and also used for classification and regression tasks.
The accuracy it gives is grater then compared to other models. This method can easily handle
large datasets. Leo Bremen develops Random Forest. It is popular ensemble Learning
Method. Random Forest Improve Performance of Decision Tree by reducing variance. It
operates by constructing a multitude of decision trees at training time and outputs the class
that is the mode of the classes or classification or mean prediction (regression) of the
individual trees.
Algorithm-
o The first step is to select the R features from the total features m where R<<M.
o Among the R features, the node using the best split point.
o Split the node into sub nodes using the best split.
o Repeat a to c steps until l number of nodes has been reached.
o Built forest by repeating steps a to d for a number of times to create n number of
trees.
The random forest finds the best split using the Gin-Index Cost Function, which is given by:
The first step is to need the take a glance at choices and use the foundations of each
indiscriminately created decision tree to predict the result and stores the anticipated
outcome at intervals the target place. Secondly, calculate the votes for each predicted target
and ultimately, admit the high voted predicted target because of the ultimate prediction
from the random forest formula. Some of the options of Random Forest does correct
predictions result for a spread of applications are offered.
3.1.3 Responsibilities
Projects are initiated to solve a problem. If the project does not solve the problems, it is
intended to; the project is of no use. Also, the developers should work according to their
plans; otherwise, the project might be somewhat out of track. The developers should
thoroughly understand their responsibilities.
Responsibilities of Supervisor
o Schedule the project
o Schedule Tracking
o Share information
o Documentation
Histograms
The value of Pearson's Correlation Coefficient can be between -1 to +1. 1 means that they are
highly correlated and 0 means no correlation.
A heat map is a two-dimensional representation of information with the help of colors. Heat
maps can help the user visualize simple or complex information.
In this process, the data is divided into X and y variable as shown below:
Stratify property in train test split This stratify parameter makes a split so that the proportion
of values in the sample produced will be the same as the proportion of values provided to
parameter stratify. For example, if variable y is a binary categorical variable with values 0
and 1 and there are 25% of zeros and 75% of ones, stratify=y will make sure that your
random split has 25% of 0's and 75% of 1's.
Logistics Regression
The figure above shows the comparison between accuracy of different models. Among the 4
models, Logistic Regression has the highest accuracy of 79.8%.
The diagram below shows the web application architecture. The user loads the homepage, enters the
data, and then submits it. The client side sends request to the web server and the web server responds
to the client’s request. The web server then fetches the trained model and returns the result to the user.
3.4.3 Heroku
Heroku is a cloud platform as a service (PaaS) supporting several programming languages.
One of the first cloud platforms, Heroku has been in development since June 2007, when it
supported only the Ruby programming language, but now supports Java, Node.js, Scala,
Clojure, Python, PHP, and Go. For this reason, Heroku is said to be a polyglot platform as it
has features for a developer to build, run and scale applications in a similar manner across
most languages.[ CITATION Wik1 \l 1033 ]
Proposal Writing
Draft of chapter 1
Data Collection
Training dataset and preprocessing
Feature extraction
Draft of chapter 2
Trained datas
Testing
Web Application Development
Draft of chapter 3&4
Final Draft
8/29/2019 12/7/2019 3/16/2020 6/24/2020 10/2/2020 1/10/2021
In Experimental studies, the dataset have been partitioned between 80% –20% for training
and testing purpose. Table 3 shows Logistic Regression being the simplest classifier have
performed well with an accuracy of 79.8%. ROC is plotted for all the algorithms. More the
area covered better is the classifier. These measurements are taken by using sklearn tool
on Pima Indian Diabetes Data set taken from UCI repository. The results are shown in
Table 4. The results may be improved by applying large size updated data sets of realistic
context. However, we need to apply other machine learning algorithms using real data set
before generalizing the results.
As there is a saying “Health is wealth”, so, the main objective of this application is to predict
the chances of getting diabetes in the near future and make the users realize about their
health and well-being. The user interface is pretty simple and is easily accessible where the
users have to input some values and get the result in an instance.
4.2.1 Screenshots and codes:
Code to display homepage:
<div class="row">
</div>
<label for="first_name"><b>Age</b></label>
</div>
</div>
<label for="first_name"><b>Pregnancies</b></label>
</div>
</div>
</div>
</div>
Code to display output page:
As we were working on the project, the both of us have had many realizations as there were
lots of difficulties. The first and foremost problems that arose was while collecting data. Due
to the current situation, onsite data collection wasn’t an option. So, we had to contact the
hospitals and research centers via telephone and mail. Some of them responded whereas
some didn’t. Even the respondent didn’t agree to provide us data due to their policies. Using
Google Forms was an alternative, but we didn’t get an adequate amount of data, so we were
bound to use the data from PIMA Indian Diabetes Database for our project.
Selecting appropriate algorithms for the analysis was also a challenge but in the end simple
and mostly used algorithms was selected and compared based on their performance. For easy
understanding, only default parameters were sent to the models, to increase their performance
appropriate parameters need to be fitted into the models.
For the successful completion of the project, various tools and programming languages were
used. We chose Python as a programming language as it is easy to use, powerful, and
versatile, making it a great choice for beginners and experts alike. For model training and
selection, we used Jupyter Notebook as it allows users to illustrate the analysis process step
by step by arranging the stuff like code, images, text, output etc. in a step by step manner. It
also helps users to document the thought process while developing the analysis process. Vast
selection of libraries and modules like Numpy, Pandas, Matplotlib, etc. in Python allowed us
to select and create model in a very easy and simple way. For web application development,
we used Flask, a web framework. As flask provides different tools, libraries and
technologies, it allows us to build a web application easily.
The overall system was designed so as to make the users more conscious towards their health
and lifestyle. The users can be aware about their health condition and act accordingly rather
than facing the problem in the future.
Although the project is completed, there are lots of areas where the project can be improved.
Due to the present condition, more features could not be added to and hence pended for
future enhancement.
The results in terms of precision for the algorithms Logistic Regression, Support Vector
Machine, K-Nearest Neighbors and Random Forest Classifier were 0.80, 0.76, 0.74,0.72
respectively.
The results in terms of recall for the algorithms Logistic Regression, Support Vector
Machine, K-Nearest Neighbors and Random Forest Classifier were 0.78, 0.74, 0.73, 0.71
respectively.
The results in terms of F1-score for the algorithms Logistic Regression, Support Vector
Machine, K-Nearest Neighbors and Random Forest Classifier were 0.79, 0.74, 0.73, 0.71
respectively.
The results in terms of AUC for the algorithms Logistic Regression, Support Vector
Machine, K-Nearest Neighbors and Random Forest Classifier were 0.87, 0.84, 0.82, 0.76
respectively.
Comparing all these factors among all the algorithms, we found out that Logistic Regression
had the highest points. So, we came to the conclusion on choosing Logistic Regression for
training our model and implementing it in our web application.
A lot of effort and research finally led to the completion of this project. Having tremendous
support from our supervisor, teachers, friends and others has been a very overwhelming
experience for us. From the initial brainstorming to the research about similar international
projects, it has been a thorough learning experience for us.
Along with the completion of this project, the most essential experience of teamwork was
acquired. We learnt and got an opportunity to see lots of new and interesting tools,
programming and designing concepts, their implementation, and their usage. Without our
teamwork, the project would not have been possible.
Managing time for completing this project has been a strong challenge for us. It would have
been very difficult without the support and consultation provided by our teachers, friends and
supervisor. Though we have tried our best to make this report efficient, effective, and
purposeful, there may be some drawbacks and hence, advice and suggestions are welcome for
the correction and improvement of this project.
REFERENCES
[1] C. H. W. R. E. J. B. M. K. Adam G. Tabák, "Prediabetes: A high-risk state for developing
diabetes," National Center for Biotechnology Information, 2014.
[2] "U.S. Department of Health and Human Services," [Online].
[3] V. K. Harleen Kaur, "Predictive modelling and analytics for diabetes using a machine learning
approach," Applied Computing and Informatics, 2020.
[4] R. R. Poudel, "Diabetes and Endocrinology in Nepal," US National Library of Medicine, 2018.
[5] N. S. T. Gangil, "Analysis of diabetes mellitus for early prediction using optimal features
selection," J Big Data.
[6] K. R. a. A. Parajuli, "Diabetes in Nepal," HERD international, 2018.
[7] D. D. SinghSisodiab, "Prediction of Diabetes using Classification Algorithms," Procedia
Computer Science, vol. 132, 2018.
[8] P. Pradhan, "AGeneticProgrammingApproachf".
[9] T. A. R. S. A. R. M. Abdullah, "An Intelligent Approach for Diabetes Classification, Prediction
and Description," 2015.
[10] D. S. S. Deepti Sisodia, " Prediction of Diabetes using Classification Algorithms," International
Conference on Computational Intelligence and Data Science.
[11] J. p. K. S. Balamurali, "Performance Analysis of Classifier Models to Predict Diabetes
Mellitus," Procedia Computer Science , 2015.
[12] K. R. S. N. Yuvaraj, "Diabetes prediction in healthcare systems using machine learning
algorithms on Hadoop cluster," Cluster Computing, no. 1, 2019.
[13] N. P. B. K. Z. Tafa, "An intelligent system for diabetes prediction," 4th Mediterranean
Conference on Embedded Computing (MECO), 2015.
[14] D. S. V. Mitushi Soni, "Diabetes Prediction using Machine Learning Techniques,"
INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT), vol. 09, no.
09.
[15] "Skill Promise," [Online]. Available:
http://skillpromise.lexiconcpl.com/dev/node/program/978.
[16] R. L. |. B. A. Englund, "The complete project manager," Project Management Institute., 2012.
[17] "Wrike," [Online]. Available: https://www.wrike.com/project-management-guide/faq/what-
are-project-management-tools/.
[18] T. X. T. W. W. D. Theodora S. Brisimi, "Predicting Chronic Disease Hospitalizations from
Electronic Health Records: An Interpretable Classification Approach," 2018.
[19] "Visual Studio Code," [Online]. Available: https://code.visualstudio.com/docs.
[20] S. Owino, "Data Driven Inspector," 2019. [Online]. Available:
https://medium.datadriveninvestor.com/python-programming-language-ac762a3b5977.
[21] "Wikipedia," [Online]. Available:
https://en.wikipedia.org/wiki/Flask_(web_framework)#:~:text=Flask%20is%20a%20micro
%20web,require%20particular%20tools%20or%20libraries.&text=Extensions%20exist%20for
%20object%2Drelational,several%20common%20framework%20related%20tools..
[22] "Wikipedia," [Online]. Available: https://en.wikipedia.org/wiki/Heroku.
[23] "Jupyter Notebook/Quick Guides," [Online]. Available: https://jupyter-notebook-beginner-
guide.readthedocs.io/en/latest/what_is_jupyter.html.
[25] World Health Organization, "Global report on Diabetes," World Health Organization, France,
2016.
[26] World Health Organization, 2015. [Online]. Available: https://www.who.int/health-
topics/diabetes.
[27] L. Y. R. S. K. M. Y. J. R. G. Ono K, " The prevalence of type 2 diabetes mellitus and impaired
fasting glucose," 2007.
[28] Marra.F., A Deep Learning process for Iris Model Identification, 2017.